1. Y. Tan, C. Singh, K. Nasseri, A. Agarwal, B. Yu (2022). Fast interpretable greedy-tree sums (FIGS). (imodels 🔎: a python package for fitting interpretable models contains code for FIGS).

  2. A. Agarwal, Y. Tan, O. Ronen, C. Singh, B. Yu (2022). Hierarchical shrinkage: improving accuracy and interpretability of tree-based methods. Proc. ICML ( imodels 🔎: a python package for fitting interpretable models contains code for hierarchical shrinkage (HS)).

  3. N. Ghosh, S. Mei, and B. Yu (2022). The three stages of dynamics in high-dimensional kernel methods. Proc. ICLR, 2022.

  4. Y. Tan, A. Agarwal, and B. Yu (2021). A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds. Proc. AISTATS,

  5. N. Altieri, B. Park, J. DeNero, A. Odisho, B. Yu. (2021). Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity. JAMIA Open. 2021 Sept. 30 4(3).

  6. C. Singh, W. Ha and B. Yu (2021). Interpreting and Improving Deep-Learning Models with Reality Checks. to appear in the book entitled "xxAI - Beyond Explainable AI" (eds. Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Müller, and Wojciech Samek).

  7. B. Yu and C. Singh (2021). Seven Principles for Rapid-Response Data Science: Lessons from Covid-19 Forecasting.

  8. W. Ha, C. Singh, F. Lanusse, S. Upadhyayula, and B. Yu (2021). Adaptive Wavelet Distillation from Neural Networks through Interpretation. Proc. NeurIPS 2021. (code)

  9. M. Behr, Y. Wang, X. Li, B. Yu (2022). Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests. PNAS,

  10. N. Altieri, B. Park, M. Olson, J. DeNero, A. Odisho, B. Yu. (2021). Supervised line attention for tumor attribute classification from pathology reports: Higher performance with less data. Journal of Biomedical Informatics. 122 (2021) 103872. Previous version under title "Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data" at

  11. M. Behr*, K. Kumbier*, A. Cordova-Palomera, M. Aguirre, E. Ashley, A. Butte, R. Arnaout, J. B. Brown, J. Preist*, B. Yu* (2020). Learning epistatic polygenic phenotypes with Boolean interactions (code)

  12. B. Norgeot*, G. Quer, B. K. Beaulieu-Jones, A. Torkamani, R. Dias, M. Gianfrancesco, R. Arnaout, I. S. Kohane, S. Saria, E. Topol, Z. Obermeyer, B. Yu & A. Butte* (2020). Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist, Nature Medicine, 26, 1320–1324.

  13. B. Yu (2020). Stability expanded, in reality. Harvard Data Science Review (HDSR).

  14. B. Yu and R. Barter (2020). Data science process: one culture. JASA.

  15. R. Dwivedi*, Y. Tan*, B. Park, M. Wei, K. Horgan, D. Madigan*, B. Yu* (2020). Stable discovery of interpretable subgroups via calibration in causal studies (staDISC). International Statistical Review and also at (code)

  16. X. Li, T. M. Tang, X. Wang, J. A. Kocher, B. Yu (2020). A stability-driven protocol for drug response interpretable prediction (staDRIP). NeurISP workshop on ML4H (Machine learning for Health) Extended Abstract. (code)

  17. A. Y. Odisho, B. Park, N. Altieri, J. DeNero, M. R Cooperberg, P. R .Carroll, B. Yu (2020). Natural language processing systems for pathology parsing in limited data environments with uncertainty estimation. Journal of American Medical Informatics Association (JAMIA) Open.

  18. L. Reiger, J. W. Murdoch, S. Singh, B. Yu (2020). Interpretations are Useful: Penalizing Explanations to Align Neural Networks with Prior Knowledge. ICML Proceedings. (code)

  19. C. Singh, W. Ha, F. Lanusse, V. Boehm , J. Liu, B. Yu (2020). Transformation Importance with Applications to Cosmology ICLR Workshop paper. (code)

  20. R. Dwivedi, C. Singh, B. Yu, M. J. Wainwright (2020) Revisiting minimum description length complexity in over-parametrized models.

  21. N. Altieri, R. Barter, J. Duncan, R. Dwivedi, K. Kumbier , X. Li, R. Netzorg, B. Park, C. Singh*, Y. Tan, T.Tang, Y. Wang, C. Zhang, B. Yu*. (2020) Curating a COVID-19 data repository and forecasting county-level death counts in the United States. Harvard Data Science Review (HDSR). (code) 7-day prediction results through visualizations and maps. Short talk video at Responsible Data Science Summit, July, 28, 2020

  22. B. Yu and K. Kumbier (2020) Veridical data science (PCS framework), PNAS. 117 (8), 3920-3929. QnAs with Bin Yu.

  23. R. Dwivedi, N. Ho, K. Khamaru, M. J. Wainwright, M. I. Jordan and B. Yu (2020) Sharp Analysis of Expectation-Maximization for Weakly Identifiable Mixture Models AISTATS.

  24. R. Dwivedi, N. Ho, K. Khamaru, M. J. Wainwright, M. I. Jordan and B. Yu (2020) Singularity, Misspecification and the Convergence Rate of EM Annals of Statistics.

  25. Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2020) Fast Mixing of Metropolized Hamiltonian Monte Carlo: Benefits of Multi-Step Gradients, JMLR,

  26. R. Dwivedi, Y. Chen, M. J. Wainwright and B. Yu (2019) Log-concave Sampling: Metropolis Hastings Algorithms are Fast JMLR.

  27. D. Rothenhäusler and B. Yu (2019). Incremental causal effects.

  28. Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2018) Fast MCMC Algorithms on Polytopes. JMLR .

  29. Y. Chen, R. Dwivedi, M. J. Wainwright and B. Yu (2020) Vaidya Walk: A Sampling Algorithm Based on Volumetric-Logarithmic Barrier Allerton Conference 2017

  30. W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu* (2019) Definitions, methods, and applications in interpretable machine learning. PNAS, 116 (44) 22071-22080.

  31. W. J. Murdoch, C. Sign, and B. Yu (2019). Hierarchical interpretations for neural network predictions. ICLR. (code)

  32. Y. Wang, S. Wu and B. Yu (2020) Unique Sharp Local Minimum in l1-minimization Complete Dictionary Learning. JMLR. 21(63), pp. 1-52. Also at

  33. Y. Chen, R. Abbasi-Asl, A. Bloniarz, M. Oliver, B. Willmore, J. Gallant*, and B. Yu* (2018) The DeepTune framework for modeling and characterizing neurons in visual cortex area V4

  34. K. Kumbier, S. Sumanta, J. B. Brown, S. Celniker, and B. Yu* (2018) Refining interaction search through signed iterative Random Forests.

  35. Y. Chen C. Jin, and B. Yu (2018) Stability and Convergence Trade-off of Iterative Optimization Algorithms.

  36. J. Murdoch, P. Liu, and B. Yu (2018) Beyond word importance: contextual decomposition to extract interactions from LSTMs. Proc. ICLR 2018. (code)

  37. R. Diwivedi, Y. Chen, M. J. Wainwright, and B. Yu (2018) Log-concave sampling: Metropolis-Hastings algorithms are fast!

  38. Y. Chen, R. Dwivedi, M. J. Wainwright, and B. Yu (2017) Fast MCMC sampling algorithms on polytopes

  39. B. Yu and K. Kumbier (2018) Artificial Intelligence and Statistics Frontiers of Information Technology and Electronic Engineering. 19(1), 6-9.

  40. R. Abbasi-Asl and B. Yu (2017) Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

  41. R. Abbasi-Asl and B. Yu (2017) Interpreting Convolutional Neural Networks Through Compression. NIPS 2017. Symposium on Interpretable Machine Learning. (also

  42. S. Kunzel, J. Sekhon, P. Bickel, and B. Yu* (2019) Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning, PNAS. 116 (10) 4156-4165. (code)

  43. S. Basu, K. Kumbier, J. B. Brown*, and B. Yu* (2018) iterative Random Forests to discover predictive and stable high-order interactions PNAS, 115 (8), 1943-1948. (code)

  44. S. Balakrishnan, M. Wainwright, B. Yu (2017) Statistical Guarantees for the EM algorithm: from population to sample-based analysis. Annals of Statistics, 45(1), 77 - 120.

  45. R. Barter and B. Yu (2017) Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data JCGS (revised).

  46. H. Liu and B. Yu (2017) Comments on: High-dimensional simultaneous inference with the bootstrap by Dezeure et al Test. 26: 740-750.

  47. C. Carson et al (2016). UC Berkeley Data Science Planning Initiative Faculty Advisory Board (FAB) Report. FAB Report Executive Summary

  48. S. Wu and B. Yu (2018). Local identifiability of l1-minimization dictionary learning: a sufficient and almost necessary condition. JMLR. 18, 1 - 56.

  49. K. Rohe, T. Qin and B. Yu* (2016). Co-clustering directed graphs to discover asymmetries and directional communities. Proc. National Academy of Sciences (PNAS), 113(45), 12679 - 12684.

  50. R. E. Kass, B. S. Caffo, M. Davidian, X. Meng, B. Yu, Nancy Reid* (2016). Ten simple rules for effective statistical practice. PLoS Comput. Biol., 12(6): e1004961. doi:10.1371/journal.pcbi.1004961

  51. Siqi Wu, Antony Joseph, Ann S. Hammonds, Susan E. Celniker, Bin Yu*, and Erwin Frise* (2016). Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks (with support information). PNAS, pp. 4290 - 4295. (code)

  52. A. Bloniarz, C. Wu, B. Yu, A. Talwalkar (2016). Supervised neighborhoods for distributed nonparametric regression. Proc. of AISTATS, Barcelona, Spain.

  53. B. Yu (2015). Data wisdom for data science. Operational Database Management Systems (ODBMS.ORG).

  54. A. Bloniarz, H. Liu, C. Zhang, J. Sekhon, and B. Yu* (2015). Lasso adjustments of treatment effect estimates in randomized experiments. PNAS. 113, 7383 - 7390.

  55. P. Ma, M. W. Mahoney and B. Yu (2015). A Statistical Perspective on Algorithmic Leveraging. Journal of Machine Learning Research, 16, (2015), 861-911.

  56. T. Moon, Y. Wang, Y. Liu, and B. Yu (2015). Evaluation of a MISR-based high-resolution aerosol retrieval method using AERONET DRAGON campaign data. IEEE Transactions on Geoscience and Remote Sensing, 53, 4328-4339.

  57. B. Yu (2014). Let us own data science. video IMS Bulletin Institute of Mathematical Statistics (IMS) Presidental Address, ASC-IMS Joint Co nference, Sydney, July, 2014.

  58. G. Schiebinger, M. J. Wainwright and B. Yu (2014). The geometry of kernelized spectral clustering. Annals of Statistics, 43, 819-846.

  59. L. Miratrix, J. Jia, B. Yu, B. Gawalt, L. El Ghaoui, L. Barnesmoore, S. Clavier (2014). Concise comparative summaries (CCS) of large text corpora with a human experiment. Ann. Applied Statist., 8, 499-529.

  60. Y. Benjamini and B. Yu (2014). The shuttle estimator for explainable variance in fMRI experiments. Annals of Applied Statistics, 7, 2007-2033.

  61. D. Bean, P. Bickel, N. El Karoui and B. Yu (2014). Optimal M-estimation in high-dimensional regression. Proceedings of National Academy of Sciences, 110, 1456314568.

  62. N. El Karoui, D. Bean, P. Bickel, C. Lim, and B. Yu (2014). On robust regression with high-dimensional predictors. Proceedings of National Academy of Sciences, 110, 1455714562.

  63. P. Ma, M. W. Mahoney, B. Yu (2014). A Statistical Perspective on Algorithmic Leveraging. Proc. of International Conference on Machine Learning (ICML) (This conference paper contains some of preliminary results of the journal submission Ma et al. (2015))

  64. A. Bloniarz, A. Talwalkar, J. Terhorst, M. Jordan, D. Patterson, B. Yu and Y. Song (2014). Changepoint Analysis for Efficient Variant Calling. Proc. of RECOMB 2014 (to appear).

  65. Tao Shi (2013), A conversation with Professor Bin Yu International Chinese Statistical Association (ICSA) Bulletin, Vol 25, Issue 2, pp 85-98. ( Selected Parts in Statblogs )

  66. A. Joseph and B. Yu (2016). The impact of regularization on spectral clustering. Annals of Statistics. 4, 1765 - 1791.

  67. C. Lim and B. Yu (2016). Estimation Stability with Cross Validation (ESCV) Journal of Computational and Graphical Statistics. 25, 464 - 492.

  68. A. S. Hammonds, C. A. Bristow, W. W. Fisher, R. Weiszmann, S. Wu, V. Hartenstein, M. Kellis, B. Yu, E. Frise, and S. E. Celniker (2013). Spatial expression of transcription factors in Drosophila embryonic organ development. Genome Biology, 14(12), R140.

  69. H. Liu and B. Yu (2013). Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression. Electron. J. Statist., 7, 312-3169.

  70. J. Mairal and B. Yu (2013). Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows. Journal of Machine Learning Research, 14, 2449-2485.

  71. Y. Wang, X. Jiang, B. Yu, M. Jiang (2013). A Hierarchical Bayesian Approach for Aerosol Retrieval Using MISR Data. J. American Statistical Association, 108, 483-493.

  72. Y. He, J. Jia and B. Yu (2013). Reversible MCMC on Markov equivalence classes of sparse directed acyclic graphs. Annals of Statistics, 41(4), 1742-1779.

  73. B. Yu (2013). Stability. Bernoulli, 19 (4), 1484-1500. (Invited paper for the Special Issue commemorating the 300th anniversary of the publication of Jakob Bernoullis Ars Conjectandi in 1712)

  74. J. Mairal and B. Yu (2013). Discussion on Grouping Strategies and Thresholding for High Dimensional Linear Models Journal of Statistical Planning and Inference, 143, 1451-1453.

  75. C. Uhler, G. Raskutti, and P. Buhlmann and B. Yu (2013). Geometry of faithfulness assumption in causal inference. Annals of Statistics, 41, 436-463.

  76. L. Miratrix, J. Sehkon, and B. Yu (2013). Adjusting Treatment Effect Estimates by Post-Stratification in Randomized Experiments. Journal of Royal Statistical Society, Series B, 75 (part 2), 369-396.

  77. J. Jia, K. Rohe and B. Yu (2013) The Lasso under Poisson-like Heteroscadecity. Statistica Sinica, 23, 99-118.

  78. S. Negahban, P. Ravikumar, M. Wainwrigt, and B. Yu (2012) A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Statistical Science, 27, 538-557.

  79. G. Raskutti, M. Wainwrigt, and B. Yu (2012) Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Machine Learning Research, 13, 389-427.

  80. J. Mairal and B. Yu (2012). Complexity analysis of the Lasso regularization path. Proc. of International Conference on Machine Learning (ICML) .

  81. Yanfeng Gu, Shizhe Wang, Tao Shi, Yinghui Lu, Eugene E. Clothiaux, and Bin Yu (2012). Multiple-kernel learning-based unmixing algorithm for estimation of cloud fractions with MODIS and CLOUDSAT data. Proc. of IEEE International Geoscience and Remote Sensing Symposium (IGRSS).

  82. S. Nishimoto, A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, J. L. Gallant (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641-1646. related videos

  83. P. Ravikumar, M. Wainwright, G. Raskutti, B. Yu (2011). High-dimensional covariance estimation by minimizing l1-penalized log-determinant divergence. Electronic Journal of Statistics, 5, 935-980.

  84. G. Raskutti, M. Wainwright, B. Yu (2011). Minimax rates of estimation for high-dimensional linear regression over lq-balls. IEEE Trans. Inform. Th., 57(10), 6976-6994.

  85. K. Rohe, S. Chatterjee, and B. Yu (2011). Spectral clustering and the high-dimensional Stochastic Block Model. Annals of Statistics, 39 (4), 1878-1915

  86. V. Q. Vu, P. Ravikumar, T. Naselaris, K. N. Kay, J. L. Gallant, B. Yu* (2011). Encoding and decoding V1 fMRI responses to natural images with sparse nonparametric models. Annals of Applied Statistics, 5, 1150-1182. (*First senior author as last author in biology tradition)

  87. S. N. Pakzad, G. Rocha, and B. Yu (2011). Distributed modal identification by regularized auto regressive models. International Journal of Systems Science, 42, 1473-1489.

  88. J. Yousafzai, P. Sollich, Z. Cvetkovic, and B. Yu (2011). Combined Features and Kernel Design for Robust Phoneme Classification Using Support Vector Machines. IEEE Trans. Audio, Speech and Language Processing (to appear). 64.

  89. X. Dai, J. Jia, B. Yu, El Ghaoui (2011) SBA-term: Sparse Bilingual Association for terms. Proc. International Conference on Semantic Computing.

  90. B. Yu (2011). Asymptotics and Coding Theory: One of the n - 1 Dimensions of Terry. In Selected Works of Terry Speed (ed. S. Duoit), pp. 33-36, Springer.

  91. B. Yu (2010). Remembering Leo. Annals of Applied Statistics, 4(4), 1657-1659.

  92. J. Jia, Y. Benjamini, C. Lim, G. Raskutti, B. Yu (2010). Comment on "Envelope models for parsimonious and efficient multivariate linear regression" by R. D. Cook, B. Li, and F. Chiaromonte. Statistica Sinica, 20, 961-967.

  93. G. Raskutti, M. Wainwrigt, and B. Yu (2010) Restricted Eigenvalue Properties for Correlated Gaussian Designs. Journal of Machine Learning Research, 11, 2241-2259.

  94. J. Jia and B. Yu (2010). On model selection consistency of elastic net when p >>n. Statistica Sinica, 10, 595-611.

  95. P. Buhlmann and B. Yu (2010). Boosting. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 69-74.

  96. L. Huang, J. Jia, B. Yu, B. Chun, P. Maniatis, M. Naik (2010). Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression. Proc. NIPS 2010.

  97. Y. Han, F. Wu, J. Jia, Y. Zhuang and B. Yu (2010). Multi-task Sparse Discriminant Analysis (MtSDA) with Overlapping Categories. Proc. of The 24th AAAI Conference on Artificial Intelligence, July 11-15, Atlanta, GA.

  98. B. Gawalt, J. Jia, L. Miratrix, L. El Ghaoui, B. Yu, and S. Clavier (2010). Discovering Word Associations in News Media via Feature Selection and Sparse Classification. Proc. 11th ACM SIGMM International Confernece on Multimedia Information Retrieval (MIR).

  99. E. Anderes, B. Yu, V. Jovanovic, C. Moroney, M. Garay, A. Braverman, E. Clothiaux (2009) Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery. Annals of Applied Statistics, 3, 902-921

  100. T. Shi, M. Belkin, and B. Yu, (2009) Data Spectroscopy: Eigenspace of Convolution Operator and Clustering Annals of Statistics, 37 (6B), 3960-3984.

  101. Vincent Q. Vu, Bin Yu, Robert E. Kass (2009) Information In The Non-Stationary Case Neural computation 21, 688-703.

  102. N. Meinshausen and B. Yu (2009). Lasso-type recovery of sparse representations for high-dimensional data. Annals of Statistics 37, 246-270.

  103. P. Zhao, G. Rocha, and B. Yu (2009). The composite absolute penalties family for grouped and hierarchical variable selection Annals of Statistics 37, 3468-3497. (An earlier version 'appeared as Grouped and hierarchical model selection through composite absolute penalties' by P. Zhao, G. Rocha and B. Yu, Department of Statistics, UC Berkeley, Tech. Rep 703.)

  104. S. Negahban, P. Ravikumar, M. Wainwright, and B. Yu (2009). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers Proc. NIPS, 2009. (This conference paper contains preliminary results of the journal submission Negahban et al. 2012).

  105. G. Raskutti, M. Wainwright, B. Yu (2009) High-dimensional regression under lq-ball sparsity: Optimal rates of convergence. Proc. of Allerton Conference on Communication, Control, and Computing. (This conference paper contains some of preliminary results of the journal submission Ravikumar et al. 2011).

  106. G. Raskutti, M. Wainwrigt, and B. Yu (2009) Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. Proc. NIPS, 2009. (This conference paper contains some of preliminary results of the journal submission Ravikumar et al. 2011).

  107. T. Shi, B. Yu, E. Clothiaux, and A. Braverman (2008). Daytime Arctic Cloud Detection based on Multi-angle Satellite Data with Case Studies. Journal of American Statistical Association. 103( 482), 584-593.

  108. Peter Buhlmann and Bin Yu (2008) Invited discussion on "Evidence contrary to the statistical view of boosting (D. Mease and A. Wyner)". (paper with discussion) Journal of Machine Learning Research 9, 187-194.

  109. P. Ravikumer, V. Vu, B. Yu, T. Naselaris, K. Kay, J. Gallant (2008). Nonparametric sparse hiearchical models describe V1 fMRI responses to natural images In Adavances in Neural Information Processing Systems (NIPS) 21, (2008). (This conference paper contains some preliminary results of journal paper Vu et al. (2011) on encoding models, but also contains an encoding model that is not in Vu et al. (2011). It does not contain decoding results.)

  110. P. Ravikumar, G. Raskutti, M. Wainwright, B. Yu (2008) Model selection in Gaussian graphical models: high-dimensional consistency of l1-regularized MLE. In Adavances in Neural Information Processing Systems (NIPS) 21, (2008).

  111. T. Shi, M. Belkin, and B. Yu (2008). Data spectroscopy: learning mixture models using eigenspaces of convolution operators. Proc. of ICML 2008.

  112. M. Ager, Z. Cvetkovic, P. Pollich, and B. Yu (2008). Towards Robust Phoneme Classification Augmentation of PLP Models with Acoustic Waveforms. Proceedings of EUSIPCO.

  113. J. Yousafzai, Z. Cvetkovi ́c, P. Pollich, and B. Yu (2008). Combined PLP-Acoustic Waveform Classification for Robust Phoneme Recognition using Support Vector Machines. Proceedings of EUSIPCO.

  114. N. Meinshausen, G. Rocha, and B. Yu (2007). A tale of three cousins: Lasso, L2Boosting, and Danzig Annals of Statistics (invited discussion on Candes and Tao's Danzig Selector paper)

  115. V. Vu, B. Yu, and R. Kass (2007). Coverage Adjusted Entropy Estimation. Statistics and Medicine, 26(21), 4039-4060.

  116. B. Yu (2007). Embracing Statistical Challenges in the Information Technology Age Technometrics (special issue on statistics and information technologies). vol. 49 (3), 237-248.

  117. X. Jiang, Y. Liu, B. Yu and M. Jiang (2007). Comparison of MISR aerosol optical thickness with AERONET measurements in Beijing metropolitan area. Remote Sensing of Environment (Special Issue on Multi-angle Imaging SpectroRadiometer), vol. 107, pp. 45-53.

  118. T. Shi, E. E. Clothiaux, B. Yu, A. J. Braverman, and G. N. Groff (2007). Detection of Daytime Arctic Clouds using MISR and MODIS Data. Remote Sensing of Environment (Special Issue on Multi-angle Imaging SpectroRadiometer), vol. 107, pp. 172-184.

  119. Peng Zhao and Bin Yu (2006). On Model Selection Consistency of Lasso. J. Machine Learning Research, 7 (nov), 2541-2567.

  120. B. Yu (2006). Comments on: Monitoring networked applications with incremental quantile estimation by Chambers et al. Statist. Sci., 21, 483-485.

  121. B. Yu (2006). Comments on: Regularization in Statistics, by P. J. Bickel and B. Li. Test, vol. 15 (2), pages 314-316.

  122. P. Buhlmann and B. Yu (2006). Sparse Boosing Journal of Machine Learning Research ( 7 (June), 1001-1024). This is a shortened and more focused version of Buhlmann and Yu "Boosting, Model Selection, Lasso and Nonnegative Garotte" given below.

  123. J. Gao, H. Suzuki, and B. Yu (2006). Approximation Lasso Methods for Language Modeling. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pp. 225-232, Sydney.

  124. T. Shi and B. Yu (2005). Binning in Gaussian Kernel Regularization. Statistica Sinica (special issue on machine learning), 16, 541-567.

  125. G. Liang, N. Taft, and B. Yu (2005). A fast lightweight approach to origin-destination IP traffic estimation using partial measurements. Tech Report 687, Statistics Department, UCB (accepted for Special Issue of IEEE-IT and ACM Networks on data networks, Jan. 2006)

  126. Tong Zhang and B. Yu (2005). Boosting with early stopping: convergence and consistency. The Annals of Statistics. Vol. 33, 1538-1579.

  127. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu (2005) Network tomography: recent developments. Statistical Science, 19, 499-517.

  128. C. D. Giurcaneanu and B. Yu (2005). Efficient algorithms for discrete universal denoising for channels with memeory. Proceedings of International Symposium on Information Theory, Australia. (Also as Tech. Report 686, Statistics Department, UCB (Proc. ISIT, Sept. 2005))

  129. P. Zhao and B. Yu (2004). Stagewise Lasso (old title: Boosted Lasso) Journal of Machine Learning Research, 8, 2701-2726. (An earlier version appeared as Tech. Report #678, Statistics Department, UC Berkeley (December, 2004; revised in April, 2005)

  130. D. J. Diner et al (2004). PARAGON: A Systematic, Integrated Approach to Aerosol Observation and Modeling. American Meterological Society, Oct., 1491-1501.

  131. P. Buhlmann and B. Yu (2004). Discussion on three boosting papers by Jiang, Lugosi and Vayatis, and Zhang Annals of Statistics. 32 (1): 96-101.

  132. R. Jorsten and B. Yu (2004). Compressing genomic and proteomic array images for statistical analyses. Invited chapter in a book on Genomic signal processing and statistics, edited by E. R. Dougherty, I. Shmulevich, J. Chen, and Z. J. Wang, pp. 341 - 366.

  133. G. Liang, B. Yu, and N. Taft (2004). Maximum entropy models: convergence rates and application in dynamic system monitoring. International Symposium on Information Theory, Chicago.

  134. R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu (2003). Internet Tomography: Recent Developments Statistical Science. Vol. 19(3), 499-517.

  135. G. Liang and B. Yu (2003). Maximum Pseudo Likelihood Estimation in Network Tomography. IEEE Trans. on Signal Processing (Special Issue on Data Networks). 51(8), 2043-2053

  136. Rebecka Jornsten and Bin Yu (2003). Simultaneous Gene Clustering and Subset Selection for Classification via MDL. Bioinformatics. 19(9): 1100-1109.

  137. Peter Buhlmann and Bin Yu (2003). Boosting with the L2 Loss: Regression and Classification. J. Amer. Statist. Assoc. 98, 324-340.

  138. R. Jornsten, W. Wang, B. Yu, and K. Ramchandran (2003). Microarray image compression: SLOCO and the effects of information loss. Signal Processing Journal (Special Issue on Genomic Signal Processing). 83, 859-869.

  139. G. Liang and B. Yu (2003). Pseudo Likelihood Estimation in Network Tomography. Proceedings of of Infocom, San Francisco.

  140. Peter Buhlmann and Bin Yu (2002). Analyzing Bagging. Annals of Statistics vol. 30, 927-961.

  141. R. Jornsten, M. Hansen, and B. Yu (2002). Adaptive Minimum Description Length (MDL) criteria with applications to microarray data. In Advances in Minimum Description Length: Theory and Applications, edited by P. Grunwald, I.J. Myung and M.A. Pitt. The MIT Press, pp. 295-321.

  142. Mark Hansen and Bin Yu (2002). Minimum Description Length Model Selection Criteria for Generalized Linear Models. {\em Science and Statistics: Festschrift for Terry Speed}, IMS Lecture Notes -- Monograph Series, Vol. 40.

  143. Rebecka Jornsten, and Bin Yu (2002). Multiterminal Estimation: Extensions and a Geometric interpretation. Proceedings of International Symposium on Information Theory (ISIT), June, 2002.

  144. Gerald Schuller, Bin Yu, Dawei Huang, and Bern Edler (2002). Perceptual Audio Coding using Pre- and Poster- Filters and Lossless Compression. IEEE Trans. Speech and Audio Processing. Vol. 10 (6), 379-390

  145. Mark Coates, Alfred Hero, Robert Nowak, and Bin Yu (2002). Internet Tomography. Signal Processing Magazine. vol. 19, No. 3 (May issue), 47-65.

  146. M. Hansen and B. Yu (2001). Model selection and the principle of Minimum Description Length. Journal of American Statistical Association. 96, 746-774.

  147. Jin Cao, Drew Davis, Scott Vander Wiel and Bin Yu (2000). [ PDF | Time-varying network tomography: router link data. J. Amer. Statist. Assoc. vol. 95, 1063-1075.

  148. Peter Buhlmann and Bin Yu (2000a). Discussion. Additive logistic regression: a statistical view of boosting, by Friedman, J., Hastie, T. and Tibshirani, R. Annals of Statistics. Vol. 28, 377-386

  149. Mark Hansen and Bin Yu (2000). Wavelet thresholding via MDL for natural images. IEEE Trans. Inform. Theory (Special Issue on Information Theoretic Imaging). vol. 46, 1778-1788.

  150. Jorma Rissanen and Bin Yu (2000). Coding and compression: a happy union of theory and practice. J. Amer. Statist. Assoc. (Year 2000 Commemorative Vignette on Engineering and Physical Sciences). vol. 95, 986-988.

  151. Lei Li and Bin Yu (2000). Iterated logarithm expansions of the pathwise code lengths for exponential families. IEEE Trans. Inform. Theory. vol. 46, 2683-2689.

  152. G. Chang, B. Yu and M. Vetterli (2000). Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Processing, vol. 9, 1532-1546.

  153. G. Chang, B. Yu and M. Vetterli (2000). Spatially adaptive wavelet thresholding based on context modeling for image denoising. IEEE Trans. Image Processing, vol. 9, 1522-1531.

  154. G. Chang, B. Yu and M. Vetterli (2000). Wavelet thresholding for multiple noisy image copies. IEEE Trans. Image Processing, vol. 9, 1631-1635.

  155. Y. Yoo, A. Ortega, and B. Yu (1999). Image subband coding using context-based classification and adaptive quantization. IEEE Trans. Image Processing, vol. 8, 1702-1215.

  156. B. Yu, M. Ostland, P. Gong and R. Pu (1999). Penalized discriminant analysis of in situ hyperspectral data for conifer species recognition. IEEE Trans. Geoscience and Remote Sensing, in press.

  157. A. Barron, J. Rissanen, and B. Yu (1998). The Minimum Description Length principle in coding and modeling. (Special Commemorative Issue: Information Theory: 1948-1998) IEEE. Trans. Inform. Th., 44, 2743-2760. Reprinted in Information 50 Years of Discovery, Theory: S. Verdu ́ and S. McLaughlin (eds), IEEE Press , 1999.

  158. B. Yu and P. Mykland (1998). Looking at Markov samplers through cusum path plots: a simple diagnostic idea. Statistics and Computing , 8, 275-286.

  159. P. Gong, R. Pu and B. Yu (1998) Conifer species recognition: effects of data transformation and band width (in Chinese) Journal of Remote Sensing, 2(3), 211-217.

  160. G. Chang, B. Yu and M. Vetterli (1998). Spatially adaptive wavelet thresholding for image denoising. Proceedings of IEEE International Conference on Image Processing, October, Chicago.

  161. S. G. Chang, B. Yu, and M. Vetterli (1998). Image denoising via lossy compression and wavelet thresholding. Proceedings of International Conference on Image Processing. Santa Barbara, California, vol. 1, pp. 604-607.

  162. M. Ostland and B. Yu (1997). Exploring quasi Monte Carlo for marginal density approximation. Statistics and Computing, 7, 217-228.

  163. P. Gong, R. Pu, and B. Yu (1997). Conifer species recognition with in Situ hyperspectral data. Remote Sensing of Environment, 62, 189-200.

  164. B. Yu and T. P. Speed (1997). Information and the clone mapping of chromosomes. Ann. Statist. 25, 169-185.

  165. D. Nelson, T. Speed, and B. Yu (1997). The limits of random fingerprinting. Genomics, 40, 1-12.

  166. B. Yu (1997). Assouad, Fano, and Le Cam. Festschrift for Lucien Le Cam . D. Pollard, E. Torgersen, and G. Yang (eds), pp. 423-435, Springer-Verlag.

  167. B. Yu (1996). Lower bounds on expected redundancy for nonparametric classes. IEEE Trans. on Information Theory, 42, 272-275.

  168. Y. Yoo, A. Ortega, and B. Yu (1996). Adaptive quantization of image subbands with efficient overhead rate selection. In Proceedings of IEEE International Conference on Image Processing, Lausanne, Switzerland.

  169. B. Yu (1996). A Statistical analysis of adaptive scalar quantization based on quantized past data. In Proceedings of International Symposium on Information Theory and its Applications (ISITA96), Victoria, Canada.

  170. B. Yu (1995). Comment: Extracting more diagnostic information from a single run using cusum path plot. Statist. Sci., 10, 54-58.

  171. J. Rissanen and B. Yu (1995). MDL learning. In Learning and Geometry: Computational Approaches, Progress in Computer Science and Applied Logic, 14, David Kueker and Carl Smith (eds), Birkhäuser, Boston, pp. 3-19.

  172. P. Mykland, L. Tierney, and B. Yu (1995). Regeneration in Markov Chain samplers. J. Amer. Statist. Assoc., 90, 233-241.

  173. B. Yu (1994a). Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22, 94-116.

  174. M. Arcones and B. Yu (1994). Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theor. Probab. 7, 47-71.

  175. B. Yu (1994). Lower bound on the expected redundancy for classes of continuous Markov sources. In Statistical Decision Theory and Related Topics V, S. S. Gupta and J. O. Berger (eds), 453-466.

  176. M. Arcones and B. Yu (1994). Limit theorems for empirical processes under dependence. In Proceedings in Chaos expansions, multiple Wiener integrals and their applications. 205-221.

  177. A. R. Barron, Y. Yang and B. Yu (1994). Asymptotically optimal function estimation by minimum complexity criteria. In Proceedings of 1994 International Symposium on Information Theory, pp. 38, Trondheim, Norway.

  178. B. Yu and T. Speed (1993). A rate of convergence result for a universal D-semifaithful code. IEEE Trans. on Information Theory 39, 8813-820.

  179. B. Yu (1993). Density estimation in the L norm for dependent data with applications to the Gibbs sampler. Ann. Statist. 21, 711-735.

  180. T. Speed and B. Yu (1993). Model selection and prediction: normal regression. J. Inst. Statist. Math. 45, 35-54.

  181. J. Rissanen, T. Speed and B. Yu (1992). Density estimation by stochastic complexity. IEEE Trans. on Information Theory, 38, 315-323.

  182. B. Yu and T. Speed (1992) Data compression and histograms. Probability Theory and Related Fields, 92, 195-229.