Project concluded in 2021
The Biomedical Big Data Training Program (BBD) at UC Berkeley officially concluded in 2021.
"The ability to harvest the wealth of information contained in biomedical Big Data will advance our understanding of human health and disease; however, lack of appropriate tools, poor data accessibility, and insufficient training, are major impediments to rapid translational impact. To meet this challenge, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative in 2012.
BD2K is a trans-NIH initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximize community engagement." -https://datascience.nih.gov/bd2k/about
Beginning in the Fall of 2016, with proposed funding for five years, this training grant will support 6 trainees per program year. We anticipate further extending the reach of our program by admitting up to 2 additional students on alternative support, thus benefitting 8 students per year. The 25 participating faculty have extensive experience with biomedical applications and expertise in biostatistics, causal inference, machine learning, the development of big data tools, and scalable computing. Together, they span 8 departments/programs:
- Biostatistics
- Computational Biology
- Computer Science
- Epidemiology
- Integrative Biology
- Molecular & Cell Biology
- Neuroscience
- Statistics
We will recruit participants from Ph.D. students in their first or second year of study in any/all of these departments. Those accepted into the program will participate in an intensive year of training courses, seminars, and workshops, beginning with introductory seminars in late summer and ending with a capstone project by each participant in the spring. Specialized training will focus on three pillars:
- Translation of biomedical and experimental knowledge and scientific questions of interest into formal, realistic problems of causal and statistical estimation
- Scalable big data computing
- Targeted machine learning with causal and statistical inference
Activities will include courses in machine learning, targeted learning, statistical programming, and big data computing, as well as workshops led by the Berkeley Data Science Institute, Statistical Computing Facility, and Berkeley Research Computing. The capstone course will involve a collaborative project in biomedical science involving the integrated and combined application of skills acquired by the trainees in the three foundational areas. Trainees will also benefit from group seminars, retreats, and interdisciplinary meetings that build a core identity with the cadre and the program. This program dovetails with several data science and precision medicine initiatives at UC Berkeley and comes at an ideal time to influence how data science is taught to all graduate students, focusing on biomedical research across campus.
Lead Principle Investigator and Director
Mark van der Laan Ph.D., Professor of Biostatistics and Statistics