Research

In 2014, I was elected to the National Academy of Sciences based on my statistical and scientific contributions, as well as my broad vision of data science best described in my article Veridical Data Science, written together with my former student Karl Kumbier. In this work, I introduced a framework based on three principles: predictability, computability and stability (abbreviated to PCS). This framework helps guide practitioners who solve domain data problems with data science tools to be creative in their analysis and properly validate their findings. I have written a book on the Veridical Data Science framework together with my former student Rebecca Barter. The book is being published by the MIT Press in 2024 with a free on-line copy available soon.

In my research group, I cultivate a strongly interdisciplinary and collaborative culture, solving data problems across fields such as neuroscience, genomics, remote sensing, and precision medicine. Through these projects we have successfully mapped a cell's destiny using spatial gene expression images of Drosophila embryos, we have characterized V4 neurons through DeepTune images, and we are currently seeking genomic markers of heart disease using UK Biobank data. Recently, my group and I have developed approaches for predicting county-level COVID-19 death counts in an effort to support the non-profit, Response4Life, who are working towards distributing PPE across the country to those who need it most.

The Yu Group and I have also developed an array of statistical and machine learning methods inspired by our interdisciplinary projects, including stability-driven nonnegative matrix factorization (staNMF) for unsupervised learning, iterative Random Forests (iRF) and signed iRF (s-iRF) for discovering predictive and stable high-order (Boolean) interactions in supervised learning, contextual decomposition (CD) and aggregated contextual decomposition (ACD) for phrase or patch importance extraction from Deep Neural Networks (DNNs).

Research is supported in part by grants from NSF,  NIH, the Weill Neurohub, and the Simons Foundation.