About

About Bin Yu

Bin Yu is the Chancellor’s Distinguished Professor in the UC Berkeley Departments of Statistics and EECS. She was Chair of the Department of Statistics at UC Berkeley from 2009 to 2012. She is a member of National Academy of Sciences and currently serves on the editorial board of Proceedings of National Academy of Sciences (PNAS).

Professor Bin Yu recieved a BS Degree in Mathematics from Peking University, and MS and Ph.D. in Statistics from UC Berkeley. She was a Member of Technical Staff at Lucent Bell-Labs, Distinguished Researcher in the Deep Learning Group of Microsoft Research, Assistant Professor at UW-Madison, and Miller Research Professor at Berkeley. She was a Visiting Faculty at MIT, Peking University, Newton Institute at Cambridge University, ETH, Yale University, Flatiron Institute, Poincare Institute, INRIA-Paris, and Fields Institute at University of Toronto.

Professor Bin Yu has published many research papers and one book, Veridical Data Science (MIT Press, 2024). Her research focuses on developing trustworthy and interpretable machine learning methods, with particular emphasis on the Predictability-Computability-Stability (PCS) framework for veridical data science. She has made fundamental contributions to statistical theory, including pioneering work on Vapnik-Chervonenkis (VC) theory for time series analysis, minimum description length (MDL) and entropy estimation, sparse modeling, boosting, spectral clustering, and MCMC convergence analysis. Her applied work spans neuroscience, genomics, remote sensing, and precision medicine, always emphasizing interdisciplinary collaboration with domain experts. Currently, she leads research in interpretable machine learning (including tree-based methods and deep learning interpretability), causal inference, and the development of stable, reproducible methods for scientific discovery. Her group has developed influential algorithms such as iterative random forests (iRF), contextual decomposition for transformers, and adaptive wavelet distillation for interpreting neural networks.

Professor Bin Yu has received many awards and honors throughout her career. She has been elected to the National Academy of Sciences and the American Academy of Arts and Sciences. Her major awards include the Guggenheim Fellowship, COPSS E. L. Scott Prize, and most recently, the COPSS Distinguished Achievement Award and Lecture (DAAL) (formerly Fisher Award and Lecture) at JSM in 2023. She has delivered several distinguished lectures, including the Wald Memorial Lectures of the Institute of Mathematical Statistics (IMS), the Tukey Memorial Lecture of the Bernoulli Society, and the Rietz Lecture of IMS. She holds an Honorary Doctorate from the University of Lausanne in Switzerland.

Professor Bin Yu has held many leadership positions in the statistical and data science communities. She served as President of the Institute of Mathematical Statistics (IMS) and was Chair of the Department of Statistics at UC Berkeley from 2009 to 2012. She served on the Inaugural Scientific Committee of the UK Turing Institute for Data Science and AI. Her editorial leadership includes current service on the Editorial Board of Proceedings of National Academy of Sciences (PNAS), and previous service on editorial boards of Annals of Statistics, Journal of American Statistical Association, and Journal of Machine Learning Research. Her committee and advisory work includes co-chairing the National Scientific Committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), serving on scientific advisory committees of SAMSI and IPAM, and on the board of trustees of ICERM and the Board of Governors of IEEE-IT Society. She recently served on the scientific advisory committee for the IAS Special Year on optimization, statistics and theoretical machine learning, and the Scientific Advisory Boards of Canadian Statistical Sciences Institute (CANSSI). Currently, she serves on the advisory board of the AI Policy Hub at UC Berkeley, the Scientific Advisory Committee of the Department of Quantitative and Computational Biology at USC, and on the External Advisory Committee for Learning the Earth with Artificial Intelligence and Physics (LEAP), an NSF Science and Technology Center (STC), at Columbia University. She is a Chan-Zuckerberg Biohub Investigator and Weill Neurohub Investigator. She is a member of the UC Berkeley Center for Computational Biology and serves as a scientific advisor at the Simons Institute for the Theory of Computing.

Research Themes

Deep Learning and Machine Learning

A. R. Hsu, Y. Cherapanamjeri, A. Y. Odisho, P. R. Carroll, B. Yu (2024). Mechanistic Interpretation through Contexual Decomposition in Transformers.
S. Hayou, N. Ghosh, B. Yu (2024). LoRA+: Efficient Low Rank Adaptation of Large Models. Proc. ICML.
A. R. Hsu, Y. Cherapanamjeri, B. Park, T. Naumann, A. Odisho, and B. Yu (2023). Diagnosing transformers: illuminating feature space for clinical decison-making. ICLR 2023.
J. Murdoch, P. Liu, and B. Yu (2018) Beyond word importance: contextual decomposition to extract interactions from LSTMs. Proc. ICLR 2018.

Interdisciplinary Research in Biomedicine

Q. Wang,* T. M. Tang, N. Youlton, C. S. Weldy, A. M. Kenney, O. Ronen, J. W. Hughes, E. T. Chin, S. C. Sutton. A. Agarwal, X. Li, M. Behr, K. Kumbier, C. S. Moravec, W. H. W. Tang, K. B. Margulies, T. P. Cappola, A. J. Buitte, R. Arnaout, J. B. Brown, J. R. Priest, V. N. Parikh, B. Yu, E. Ashley* (2023). Epistasis regulates genetic control of cardiac hypertrophy.
E. Irajizad, A. Kenney, T. Tang, J. Vykoukal, R. Wu, E. Murage, J. B. Dennison, M. Sans, J. P. Long, M. Loftus, J. A. Chabot, M. D. Kluger, F. Kastrinos, L. Brais, A. Babic, K. Jajoo, L. S. Lee, T. E. Clancy, K. Ng, A. Bullock, J. M. Genkinger, A. Maitra, K. A. Do, B. Yu, B. W. Wolpin, S. Hanash, J. F. Fahrmann. (2023). A blood-based metabolomic signature predictive of risk for pancreatic cancer. Cell Reports Medicine 4(9): 101194.
B. Norgeot, G. Quer, B. K. Beaulieu-Jones, A. Torkamani, R. Dias, M. Gianfrancesco, R. Arnaout, I. S. Kohane, S. Saria, E. Topol, Z. Obermeyer, B. Yu & A. Butte (2020). Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 26, 1320–1324.

Interpretable Machine Learning

Q. Zhang, C. Singh, L. Liu, X. Liu, B. Yu, J. Gao, T. Zhao (2023). Tell your model where to attend: post-hoc attention steering for LLMs. ICLR 2024.
A. Agarwal, A. M. Kenny, Y. S. Tan, T. M. Tang, B. Yu (2023). MDI+: a flexible random forest-based feature importance framework.
C. Singh, A. R. Hsu, R. Antonello, S. Jain, A. G. Huth, B. Yu and J. Gao (2023). Explaining black box text modules in natural language with language models.
W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu* (2019) Definitions, methods, and applications in interpretable machine learning. PNAS, 116 (44) 22071-22080.

Tree-based Methods

Y. S. Tan, C. Singh, K. Nasseri, A. Agarwal, J. Duncan, O. Ronen, M. Epland, A. Kornblith, B. Yu (2022). Fast interpretable greedy-tree sums (FIGS).
A. Agarwal, Y. S. Tan, O. Ronen, C. Singh, B. Yu (2022). Hierarchical shrinkage: improving accuracy and interpretability of tree-based methods. Proc. ICML
M. Behr, Y. Wang, X. Li, B. Yu (2022). Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests. PNAS.
S. Basu, K. Kumbier, J. B. Brown, and B. Yu (2018) iterative Random Forests to discover predictive and stable high-order interactions PNAS, 115 (8), 1943-1948.

Veridical Data Science (PCS)

B. Yu (2024). After Computational Reproducibility: Scientific Reproducibility and Trustworthy AI Harvard Data Science Review (HDSR).
B. Yu and K. Kumbier (2020) Veridical data science PNAS. 117 (8), 3920-3929.
B. Yu (2023). What is uncertainty in today’s practice of data science? J. Econometrics. 237, 105519.