Veridical Data Science (PCS)
Complete Paper List in Reverse Chronological Order
Selected Recent Papers on Veridical Data Science (PCS)
- B. Yu (2024). After Computational Reproducibility: Scientific Reproducibility and Trustworthy AI (discussion of Donoho’s paper “Data Science at the Singularity”) Harvard Data Science Review (HDSR).
- B. Yu (2023). What is uncertainty in today’s practice of data science? J. Econometrics. 237, 105519.
- Q. Wang, T. M. Tang, N. Youlton, C. S. Weldy, A. M. Kenney, O. Ronen, J. W. Hughes, E. T. Chin, S. C. Sutton. A. Agarwal, X. Li, M. Behr, K. Kumbier, C. S. Moravec, W. H. W. Tang, K. B. Margulies, T. P. Cappola, A. J. Buitte, R. Arnaout, J. B. Brown, J. R. Priest, V. N. Parikh, B. Yu, E. Ashley (2023). Epistasis regulates genetic control of cardiac hypertrophy. https://www.medrxiv.org/content/10.1101/2023.11.06.23297858v1 (Code) (PCS documentation)
- R. Cahill, Y. Wang, R. P. Xian, A. J. Lee, H. Zeng, B. Yu, B. Tasic, R. Abbasi-Asl (2023). Unsupervised pattern discovery in spatial gene expression atlas reveals mouse brain regions beyond established ontology. https://www.biorxiv.org/content/10.1101/2023.03.10.531984v2 (Code)
- A. Agarwal, A. M. Kenny, Y. S. Tan, T. M. Tang, B. Yu (2023). MDI+: a flexible random forest-based feature importance framework. https://arxiv.org/abs/2307.01932 (PCS related)
- M. Behr, K. Kumbier, A. Cordova-Palomera, M. Aguirre, E. Ashley, A. Butte, R. Arnaout, J. B. Brown, J. Preist, B. Yu (2020). Learning epistatic polygenic phenotypes with Boolean interactions https://www.biorxiv.org/content/10.1101/2020.11.24.396846v1 (code) (PCS inference case study)
- B. Norgeot, G. Quer, B. K. Beaulieu-Jones, A. Torkamani, R. Dias, M. Gianfrancesco, R. Arnaout, I. S. Kohane, S. Saria, E. Topol, Z. Obermeyer, B. Yu & A. Butte (2020). Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 26, 1320–1324.
- B. Yu (2020). Stability expanded, in reality. Harvard Data Science Review (HDSR). (PCS related)
- B. Yu and R. Barter (2020). Data science process: one culture. JASA. (PCS related)
- R. Dwivedi, Y. Tan, B. Park, M. Wei, K. Horgan, D. Madigan, B. Yu (2020). Stable discovery of interpretable subgroups via calibration in causal studies (staDISC). International Statistical Review and also at arxiv.org/abs/2008.10109 (code) (PCS case study for causal inference)
- X. Li, T. M. Tang, X. Wang, J. A. Kocher, B. Yu (2020). A stability-driven protocol for drug response interpretable prediction (staDRIP). NeurISP workshop on ML4H (Machine learning for Health) Extended Abstract. drive link
- B. Yu and K. Kumbier (2020) Veridical data science (PCS framework), PNAS. 117 (8), 3920-3929. QnAs with Bin Yu.
- Y. Chen, R. Abbasi-Asl, A. Bloniarz, M. Oliver, B. Willmore, J. Gallant, and B. Yu (2018) The DeepTune framework for modeling and characterizing neurons in visual cortex area V4 https://www.biorxiv.org/content/10.1101/465534v1
- K. Kumbier, S. Sumanta, J. B. Brown, S. Celniker, and B. Yu* (2018) Refining interaction search through signed iterative Random Forests. https://arxiv.org/abs/1810.0728 (an enhanced version of iRF, PCS related)
- S. Basu, K. Kumbier, J. B. Brown, and B. Yu (2018) iterative Random Forests to discover predictive and stable high-order interactions PNAS, 115 (8), 1943-1948. (code) (PCS related)
- Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu, and Erwin Frise (2016). Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks (with support information). PNAS, pp. 4290 - 4295. (code) (PCS related)