Home

Bin Yu

CDSS Chancellor’s Distinguished Professor

Departments of Statistics and Electrical Engineering and Computer Sciences, Center for Computational Biology

Chan-Zuckerberg Biohub Investigator Alumnus • Weill Neurohub Investigator

mail: 367 Evans Hall #3860 • Berkeley, CA 94720

phone: 510-642-2781 • fax: 510-642-7892 • binyu@berkeley.edu

Welcome

Professor Bin Yu is the head of the Yu Group at Berkeley, which consists of 12-15 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group, Bin Yu has leveraged new computational developments to solve important scientific problems by combining novel and often interpretable statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics and precision medicine. She also develops relevant theory to understand random forests and deep learning for insight into and guidance for practice. Her work has been recognized by many awards. In particular, she was inducted to the National Academy of Sciences in 2014 and to the American Academy of Arts and Sciences in 2013. She delivered the IMS Wald Lectures and COPSS Distinguished Achievement and Award Lecture (DAAL, formerly Fisher Award and Lecture) in 2023.

She and her team have developed the PCS framework for veridical data science (or responsible, reliable, and transparent data analysis and decision-making). PCS stands for predictability, computability and stability, and it unifies, streamlines, and expands on ideas and best practices of machine learning and statistics.

In order to augment empirical evidence for decision-making, they are investigating statistical machine learning methods/algorithms (and associated statistical inference problems) such as dictionary learning, non-negative matrix factorization (NMF), EM and deep learning (CNNs and LSTMs), and heterogeneous effect estimation in randomized experiments (X-learner). Their recent algorithms include staNMF for unsupervised learning, iterative Random Forests (iRF) and signed iRF (s-iRF) for discovering predictive and stable high-order interactions in supervised learning, next generation tree-based methods (e.g. fast and interpretable greedy-tree sums (FIGS) and hierarchical shrinked (HS) trees, and RF+ ), contextual decomposition (CD), aggregated contextual decomposition (ACD), and adaptive wavelet distillation (AWD) for interpretation of Deep Neural Networks (DNNs).

My vision for data science - papers & talks

Veridical Data Science in Biology on July 11, 2024 at UC Berkeley (Submission deadline July 4)
Rome Workshop on Veridical Data Science, June 20, 2025 (New)
Veridical data science and medical foundation models on arXiv by Alaa and Yu.
VDS book review by Yuval and Yoav Benjamini in Harvard Data Science Review (HDSR)– “pedagogical excellence, diverse examples, and projects make Veridical Data Science a suitable textbook for students of all levels, in addition to being a valuable resource for data scientists in general” (review summary in HDSR Editor-in-Chief’s note), June, 2024.
Inaugural Berkeley-Stanford Workshop on Veridical Data Science at UC Berkeley (May 31, 2024) (talk videos available)
My co-author Rebecca Barter and I are thrilled to announce the online release of our MIT Press book “Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making”, an essential source for producing trustworthy data-driven results (Feb. 28, 2024).
“Veridical data science towards trustworthy AI” (Talk video of COPSS DAAL (formerly Fisher Award and Lecture), scroll down), Aug. JSM in Toronto, 2023.
What is uncertainty in today’s practice of data science? (Yu, 2023). Journal of Econometrics.
Building trust in medical AI algorithms with veridical data science (interview of Bin Yu by Dr. Merle Behr for the German scientific journal KI - Künstliche Intelligenz), 2023
Veridical data science (PCS framework: v-flow code and documentation template), PNAS, 2020 (QnAs with Bin Yu)
Breiman Lecture (video) at NeurIPS “Veridical data Science” (PCS framework and iterative random forests (iRF)), 2019; updated slides, 2020
Stability, Bernoulli, 2013
Stability expanded, in reality, Harvard Data Science Review (HDSR), 2020.
Data science process: one culture. JASA, 2020.
Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 2020.
Definitions, methods and applications in interpretable machine learning, PNAS, 2019
Data wisdom for data science (blog), 2015
IMS Presidential Address “Let us own data science”, IMS Bulletin, 2014
Embracing statistical challenges in the IT age, Technometrics, 2007

In the news

IMS Wald Lecture I and Lecture II, and COPSS Distinguished Award and Lecture (DAAL) delivered at Joint Statistical Meeting (JSM) in Toronto, August 2023
UC Berkeley team part of the new NSF AI Center for Cybersecurity ACTION led by UCSB.
CDSS news: Statistics-Computer Science team reflects on tackling covid outbreaks, May, 2022
Honorary Doctorate, University of Lausanne (UNIL) (Faculty of Business and Economics), June 4, 2021 (Interview of Bin Yu by journalist Nathalie Randin, with an introduction by Dean Jean-Philippe Bonardi of UNIL in French (English translation))
CDSS news on our PCS framework: “A better framework for more robust, trustworthy data science”, Oct. 2020
UC Berkeley to lead $10M NSF/Simons Foundation program to investigate theoretical underpinnings of deep learning, Aug. 25, 2020
Curating COVID-19 data repository and forecasting county-level death counts in the US, 2020
Interviewed by PBS Nova about AlphaZero, 2018
Mapping a cell’s destiny, 2016
Seeking Data Wisdom, 2015
Member, National Academy of Sciences, 2014
Fellow, American Academy of Arts and Sciences, 2013
One of the 50 best inventions of 2011 by Time Magazine, 2011
The Economist Article, 2011
ScienceMatters @ Berkeley. Dealing with Cloudy Data, 2004