Difference between revisions of "SOCR News ISI WSC DSPA Training 2021"

Revision as of 12:58, 11 March 2021

SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA)

2021 ISI World Statistical Congress

Overview

....

Organizer

Ivo Dinov, University of Michigan, SOCR, MIDAS.

Session Logistics

Date/Time: Wednesday & Thursday, June 16-17, 2021, 14.00-17.00, Central European Summer Time, CEST (UTC+2)
Registration: TBD.
URL: TBD.
Conference: 2021 ISI World Statistical Congress.
Session Format: Daily 3-hour sessions.
Session URL: https://myumi.ch/erXm2.

Overview

This course will be based on a Data Science and Predictive Analytics (DSPA) course I teach at the University of Michigan. The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.

Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches.

Vision

Enable active-learning by integrating driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference

Values

Effective, reliable, reproducible, and transformative data-driven discovery supporting open-science

Strategic priorities

Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. Instructors will provide well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.

Outcomes

Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas: Algorithms and Applications, Data Management, and Analysis Methods. Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.

Areas	Competency	Expectation
Algorithms and Applications	Tools	Working knowledge of basic software tools (command-line, GUI based, or web-services)	Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL
	Algorithms	Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures	Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching
	Application Domain	Data analysis experience from at least one application area, either through coursework, internship, research project, etc.	Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences
Data Management	Data validation & visualization	Curation, Exploratory Data Analysis (EDA) and visualization	Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)
	Data wrangling	Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration	Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data

	Data infrastructure	Handling databases, web-services, Hadoop, multi-source data	Data structures, SOAP protocols, ontologies, XML, JSON, streaming
Analysis Methods	Statistical inference	Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling	Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression
	Study design and diagnostics	Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates	Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction
	Machine	Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN	Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning
	Learning

Program

...

Resources

...

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 17: / Line 17: @@
 * '''Conference''': [https://www.isi2021.org/ 2021 ISI World Statistical Congress].
 * '''Session Format''':  Daily 3-hour sessions.
-* [https://myumi.ch/qgRl1 Session URL]: https://myumi.ch/qgRl1.
+* [https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_WSC_DSPA_Training_2021 Session URL]: https://myumi.ch/erXm2.
+== Overview==
+This course will be based on a [https://www.socr.umich.edu/people/dinov/DSPA_Courses.html Data Science and Predictive Analytics (DSPA) course] I teach at the University of Michigan. The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.
+Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches.
+===Vision===
+Enable active-learning by integrating driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference
+===Values===
+Effective, reliable, reproducible, and transformative data-driven discovery supporting open-science
+===Strategic priorities===
+Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. Instructors will provide well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.
+===Outcomes===
+Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas: Algorithms and Applications, Data Management, and Analysis Methods. Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.
+{| class="wikitable"
+! Areas !! Competency !! Expectation !!  !! Notes
+|-
+| Algorithms and Applications || Tools || Working knowledge of basic software tools (command-line, GUI based, or web-services) || Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL ||
+|-
+|  || Algorithms || Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures || Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching ||
+|-
+|  || Application Domain || Data analysis experience from at least one application area, either through coursework, internship, research project, etc. || Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences ||
+|-
+| Data Management || Data validation & visualization || Curation, Exploratory Data Analysis (EDA) and visualization || Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js) ||
+|-
+|  || Data wrangling || Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration  || Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data ||
+|-
+|  ||  ||  ||  ||
+|-
+|  || Data infrastructure || Handling databases, web-services, Hadoop, multi-source data || Data structures, SOAP protocols, ontologies, XML, JSON, streaming ||
+|-
+| Analysis Methods || Statistical inference || Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling || Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression ||
+|-
+|  || Study design and diagnostics || Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates || Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction ||
+|-
+|  || Machine  || Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN || Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning ||
+|-
+|  || Learning ||  ||  ||
+|}
 == Program==

Difference between revisions of "SOCR News ISI WSC DSPA Training 2021"

Revision as of 12:58, 11 March 2021

Contents

SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA)

Overview

Organizer

Session Logistics

Overview

Vision

Values

Strategic priorities

Outcomes

Program

Resources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools