SOCR News MIDAS Biomedical Bootcamp 2021
Revision as of 09:05, 20 April 2021 by Dinov (talk | contribs) (Created page with "== SOCR News & Events: 2021 MIDAS Data Science for Biomedical Scientists Bootcamp == Image:DSPA_ISI_WSC_anime.gif|right| [https://midas.umich.edu/data-scie...")
Contents
SOCR News & Events: 2021 MIDAS Data Science for Biomedical Scientists Bootcamp
The Michigan Institute for Data Science (MIDAS) is organizing a week-long Data Science for Biomedical Scientists Bootcamp. This workshop will introduce data science from a biomedical perspective. Bootcamp participants will learn about practical data science applications in biomedical and health case-studies. Modern data science, machine learning, artificial intelligence, and biostatistical methods will be integrated into the training curriculum.
Instructors
- Kayvan Najarian
- Nambi Nallasamy
- Ivo Dinov, University of Michigan, SOCR, MIDAS.
- Michael Mathis
- Ryan Stidham
- Jonathan Gryak
- Michael Sjoding
Workshop Logistics
- Dates/Times: Monday through Friday, July 26-20, 2021, 7:00-16:00 US-EDT (daily).
- Registration: Registration Link.
- URL: MIDAS Bootcamp Website.
- Session Format: Two daily sessions (3-hours each).
- Session URL.
Overview
- Target Audience: This workshop is open to all biomedical scientists. The curriculum is geared towards junior faculty members who plan to incorporate data science in their scholarly work.
- Prerequisite: College level math and statistics.
- Main components:
- Math and algorithmic foundations for data science
- Key concepts of data science
- Introduction to Python programming
- Machine learning, support vector machine, artificial neural network, deep learning
- Example of biomedical research projects with data science
- Incorporating data science in biomedical grant proposals
Program Schedule
Areas | Competency | Expectation | Notes |
---|---|---|---|
Algorithms and Applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL |
Algorithms | Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | |
Application Domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences | |
Data Management | Data validation & visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) |
Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration | Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data | |
Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | |
Analysis Methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression |
Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | |
Machine Learning | Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |
Additional Resources
- Course Flyer
- DSPA Wikipedia.
- DSPA Springer Page & SpringerLink (PDF Download).
- dspa.predictive.space & DSPA MOOC Canvas Site.
Translate this page: