Estimating Influenza Incidence from Diagnostic Codes

My work involved:

Analyzing large datasets (~68 million) and exploring machine learning methods in MySQL and R to track real-time influenza incidence combining both diagnostic and virologic data in electronic medical records.
Comparing and Contrasting different machine learning methods (Boosting, SVM, Random Forest, etc) using cross-validation.
Creating clear and compelling reports, visualizations, and interactive apps for collaborators in R Markdown.

Final Report: This is my paper for Fall 2017 course: P9120 Topics in Statistical Learning & Data Mining taught by Min Qian.
Slide: This is my slide for Fall 2017 course: P9120 Topics in Statistical Learning & Data Mining taught by Min Qian.