Estimating Influenza Incidence from Diagnostic Codes

This work is mentored by Sasikiran Kandula and Jeffrey Shaman in the Department of Environmental Health Sciences at Columbia University.

My work involved:

  1. Analyzing large datasets (~68 million) and exploring machine learning methods in MySQL and R to track real-time influenza incidence combining both diagnostic and virologic data in electronic medical records.
  2. Comparing and Contrasting different machine learning methods (Boosting, SVM, Random Forest, etc) using cross-validation.
  3. Creating clear and compelling reports, visualizations, and interactive apps for collaborators in R Markdown.

Additional Resources

  • Final Report: This is my paper for Fall 2017 course: P9120 Topics in Statistical Learning & Data Mining taught by Min Qian.

  • Slide: This is my slide for Fall 2017 course: P9120 Topics in Statistical Learning & Data Mining taught by Min Qian.