Machine-Learning

Identifying melanoma prognostic biomarkers by integrating imaging data and genetic data

I am conducting research with Gen Li at the Department of Biostatistics at Columbia University, Yvonne Saenger and Robyn Gartrell from Columbia University Medical Center. My work involved: Creating interactive heat maps and adopting clustering analysis to explore hierarchical structure of melanoma gene expression data. Building predictive model on recurrence using logistic regression with lasso penalty, and validated it on two separate cohorts using cross-validation with bootstrapping. Communicating results of analyses, modeling, and tests through data visualization, interactive apps, and writing in R Markdown.

ZooRisk

In the summer of 2017, I interned at Data Sciences & Analytics Group at Pacific Northwest National Laboratory. I collaborated with designers and programmers to produce a bio-surveillance web application, which will give real-time risk assessment for each news related to zoonotic diseases. My mentor is Lauren Charles. Additional Resources The following video gives a short introduction to our final web application. Slide: I was invited by Career China Club at Columbia University Mailman School of Public Health to share my experience at PNNL.

Estimating Influenza Incidence from Diagnostic Codes

This work is mentored by Sasikiran Kandula and Jeffrey Shaman in the Department of Environmental Health Sciences at Columbia University. My work involved: Analyzing large datasets (~68 million) and exploring machine learning methods in MySQL and R to track real-time influenza incidence combining both diagnostic and virologic data in electronic medical records. Comparing and Contrasting different machine learning methods (Boosting, SVM, Random Forest, etc) using cross-validation. Creating clear and compelling reports, visualizations, and interactive apps for collaborators in R Markdown.