Veridical Data Science (PCS)
Complete Paper List in Reverse Chronological Order
Selected recent paper on veridical data science (PCS)
B. Yu (2024). After Computational Reproducibility: Scientific Reproducibility and Trustworthy AI (discussion of Donoho's paper "Data Science at the Singularity") Harvard Data Science Review (HDSR).
B. Yu (2023). What is uncertainty in today's practice of data science? J. Econometrics. 237, 105519.
Q. Wang, T. M. Tang, N. Youlton, C. S. Weldy, A. M. Kenney, O. Ronen, J. W. Hughes, E. T. Chin, S. C. Sutton. A. Agarwal, X. Li, M. Behr, K. Kumbier, C. S. Moravec, W. H. W. Tang, K. B. Margulies, T. P. Cappola, A. J. Buitte, R. Arnaout, J. B. Brown, J. R. Priest, V. N. Parikh, B. Yu, E. Ashley (2023). Epistasis regulates genetic control of cardiac hypertrophy. https://www.medrxiv.org/content/10.1101/2023.11.06.23297858v1 (Code) (PCS documentation)
R. Cahill, Y. Wang, R. P. Xian, A. J. Lee, H. Zeng, B. Yu, B. Tasic, R. Abbasi-Asl (2023). Unsupervised pattern discovery in spatial gene expression atlas reveals mouse brain regions beyond established ontology. https://www.biorxiv.org/content/10.1101/2023.03.10.531984v2 (Code)
A. Agarwal, A. M. Kenny, Y. S. Tan, T. M. Tang, B. Yu (2023). MDI+: a flexible random forest-based feature importance framework. https://arxiv.org/abs/2307.01932 (PCS related)
M. Behr*, K. Kumbier*, A. Cordova-Palomera, M. Aguirre, E. Ashley, A. Butte, R. Arnaout, J. B. Brown, J. Preist*, B. Yu* (2020). Learning epistatic polygenic phenotypes with Boolean interactions https://www.biorxiv.org/content/10.1101/2020.11.24.396846v1 (code) (PCS inference case study)
B. Norgeot*, G. Quer, B. K. Beaulieu-Jones, A. Torkamani, R. Dias, M. Gianfrancesco, R. Arnaout, I. S. Kohane, S. Saria, E. Topol, Z. Obermeyer, B. Yu & A. Butte* (2020). Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 26, 1320–1324.
B. Yu (2020). Stability expanded, in reality. Harvard Data Science Review (HDSR). (PCS related)
B. Yu and R. Barter (2020). Data science process: one culture. JASA. (PCS related)
R. Dwivedi*, Y. Tan*, B. Park, M. Wei, K. Horgan, D. Madigan*, B. Yu* (2020). Stable discovery of interpretable subgroups via calibration in causal studies (staDISC). International Statistical Review and also at arxiv.org/abs/2008.10109 (code) (PCS case study for causal inference)
X. Li, T. M. Tang, X. Wang, J. A. Kocher, B. Yu (2020). A stability-driven protocol for drug response interpretable prediction (staDRIP). NeurISP workshop on ML4H (Machine learning for Health) Extended Abstract. https://arxiv.org/abs/2011.06593 (code) (PCS case study for drug discovery)
B. Yu and K. Kumbier (2020) Veridical data science (PCS framework), PNAS. 117 (8), 3920-3929. QnAs with Bin Yu.
Y. Chen, R. Abbasi-Asl, A. Bloniarz, M. Oliver, B. Willmore, J. Gallant*, and B. Yu* (2018) The DeepTune framework for modeling and characterizing neurons in visual cortex area V4 https://www.biorxiv.org/content/10.1101/465534v1
K. Kumbier, S. Sumanta, J. B. Brown, S. Celniker, and B. Yu* (2018) Refining interaction search through signed iterative Random Forests. https://arxiv.org/abs/1810.0728 (an enhanced version of iRF, PCS related)
S. Basu, K. Kumbier, J. B. Brown*, and B. Yu* (2018) iterative Random Forests to discover predictive and stable high-order interactions PNAS, 115 (8), 1943-1948. (code) (PCS related)
Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu*, and Erwin Frise* (2016). Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks (with support information). PNAS, pp. 4290 - 4295. (code) (PCS related)