Bin Yu

CDSS Chancellor's Distinguished Professor 

Departments of Statistics and Electrical Engineering and Computer Sciences, Center for Computational Biology

 UC Berkeley 

Chan-Zuckerberg Biohub Investigator Alumnus • Weill Neurohub Investigator

mail: 367 Evans Hall #3860 • Berkeley, CA 94720

phone: 510-642-2781 • fax: 510-642-7892 •  binyu@berkeley.edu

Welcome

Professor BIn Yu is the head of the Yu Group at Berkeley, which consists of 12-15 students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research interests and achievements extend beyond the realm of statistics. Together with her group,  Bin Yu has leveraged new computational developments to solve important scientific problems by combining novel and often interpretable statistical machine learning approaches with the domain expertise of my many collaborators in neuroscience, genomics and precision medicine. She also develops relevant theory to understand random forests and deep learning for insight into and guidance for practice.  Her work has been recongized by many awards. In particular, she was inducted to the National Academy of Sciences in 2014 and to the American Academy of Arts and Sciences in 2013. She delivered the IMS Wald Lectures and COPSS Distinguished Achievement and Award Lecture (DAAL, formerly Fisher Award and Lecture) in 2023.

She and her team have developed the PCS framework for veridical data science (or responsible, reliable, and transparent data analysis and decision-making). PCS stands for predictability, computability and stability, and it unifies, streamlines, and expands on ideas and best practices of machine learning and statistics.

In order to augment empirical evidence for decision-making,  they are investigating statistical machine learning methods/algorithms (and associated statistical inference problems) such as dictionary learning, non-negative matrix factorization (NMF), EM and deep learning (CNNs and LSTMs), and heterogeneous effect estimation in randomized experiments (X-learner). Their recent algorithms include staNMF for unsupervised learning, iterative Random Forests (iRF) and signed iRF (s-iRF) for discovering predictive and stable high-order interactions in supervised learning, next generation tree-based methods (e.g. fast and interpretable greedy-tree sums (FIGS) and hierarchical shrinked (HS) trees, and RF+ ), contextual decomposition (CD), aggregated contextual decomposition (ACD), and adaptive wavelet distillation (AWD) for interpretation of Deep Neural Networks (DNNs). 

My vision for data science - papers & talks

In the news