# SOCR News MIDAS Biomedical Bootcamp 2021

Revision as of 10:05, 20 April 2021 by Dinov (talk | contribs) (Created page with "== SOCR News & Events: 2021 MIDAS Data Science for Biomedical Scientists Bootcamp == Image:DSPA_ISI_WSC_anime.gif|right| [https://midas.umich.edu/data-scie...")

## Contents

## SOCR News & Events: 2021 MIDAS Data Science for Biomedical Scientists Bootcamp

The Michigan Institute for Data Science (MIDAS) is organizing a week-long Data Science for Biomedical Scientists Bootcamp. This workshop will introduce data science from a biomedical perspective. Bootcamp participants will learn about practical data science applications in biomedical and health case-studies. Modern data science, machine learning, artificial intelligence, and biostatistical methods will be integrated into the training curriculum.

## Instructors

- Kayvan Najarian
- Nambi Nallasamy
- Ivo Dinov, University of Michigan, SOCR, MIDAS.
- Michael Mathis
- Ryan Stidham
- Jonathan Gryak
- Michael Sjoding

## Workshop Logistics

**Dates/Times**: Monday through Friday, July 26-20, 2021, 7:00-16:00 US-EDT (daily).**Registration**: Registration Link.**URL**: MIDAS Bootcamp Website.**Session Format**: Two daily sessions (3-hours each).- Session URL.

## Overview

*Target Audience*: This workshop is open to all biomedical scientists. The curriculum is geared towards junior faculty members who plan to incorporate data science in their scholarly work.

*Prerequisite*: College level math and statistics.

*Main components*:- Math and algorithmic foundations for data science
- Key concepts of data science
- Introduction to Python programming
- Machine learning, support vector machine, artificial neural network, deep learning
- Example of biomedical research projects with data science
- Incorporating data science in biomedical grant proposals

### Program Schedule

Areas | Competency | Expectation | Notes |
---|---|---|---|

Algorithms and Applications | Tools | Working knowledge of basic software tools (command-line, GUI based, or web-services) | Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL |

Algorithms | Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures | Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching | |

Application Domain | Data analysis experience from at least one application area, either through coursework, internship, research project, etc. | Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences | |

Data Management | Data validation & visualization | Curation, Exploratory Data Analysis (EDA) and visualization | Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) |

Data wrangling | Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration | Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data | |

Data infrastructure | Handling databases, web-services, Hadoop, multi-source data | Data structures, SOAP protocols, ontologies, XML, JSON, streaming | |

Analysis Methods | Statistical inference | Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling | Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression |

Study design and diagnostics | Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates | Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction | |

Machine Learning | Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN | Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning |

## Additional Resources

- Course Flyer
- DSPA Wikipedia.
- DSPA Springer Page & SpringerLink (PDF Download).
- dspa.predictive.space & DSPA MOOC Canvas Site.

Translate this page: