Difference between revisions of "SOCR News ISI WSC DSPA Training 2021"

Latest revision as of 13:27, 3 March 2022

SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA)

Instructor

Ivo Dinov, University of Michigan, SOCR, MIDAS.

Dr. Dinov is a professor of Health Behavior and Biological Sciences and Computational Medicine and Bioinformatics at the University of Michigan. He is a member of the Michigan Center for Applied and Interdisciplinary Mathematics (MCAIM) and a core member of the University of Michigan Comprehensive Cancer Center. Dr. Dinov serves as Director of the Statistics Online Computational Resource, Co-Director of the Center for Complexity and Self-management of Chronic Disease (CSCD Center), Co-Director of the multi-institutional Probability Distributome Project, Associate Director of the Michigan Institute for Data Science (MIDAS), and Associate Director of the Michigan Neuroscience Graduate Program (NGP). He is a member of the American Statistical Association (ASA), International Association for Statistical Education (IASE), American Mathematical Society (AMS), American Association for the Advancement of Science (AAAS), and an Elected Member of the International Statistical Institute (ISI).

Session Logistics

Date/Time: Wednesday & Thursday, June 16-17, 2021, 14.00-17.00, Central European Summer Time, CEST (UTC+2), 8:00-11:00 AM US-EDT.
Registration: Registration Link, moderate registration fees apply.
GoToMeeting: Webinar link.
URL: Official ISI/WSC Course Website.
Conference: 2021 ISI World Statistical Congress and WSC 2021 short courses.
Session Format: Two daily sessions (3-hours each).
Session URL: https://myumi.ch/erXm2.

Overview

This course will be based on a Data Science and Predictive Analytics (DSPA) course I teach at the University of Michigan. The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.

Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches.

Prerequisites

Assumed prior knowledge includes: Completed undergraduate study with quantitative STEM exposure, some quantitative training, programming experience, and high-level of energy and motivation to learn. Preinstalled R and RStudio on user local client computer.

Vision

This course is based on active-learning and integrates driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference.

Values

The training aims to provide effective, reliable, reproducible, and transformative data-driven discovery supporting open-science.

Strategic priorities

Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. Instructors will provide well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.

Outcomes

Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas: Algorithms and Applications, Data Management, and Analysis Methods. Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.

Areas	Competency	Expectation	Notes
Algorithms and Applications	Tools	Working knowledge of basic software tools (command-line, GUI based, or web-services)	Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL
	Algorithms	Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures	Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching
	Application Domain	Data analysis experience from at least one application area, either through coursework, internship, research project, etc.	Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences
Data Management	Data validation & visualization	Curation, Exploratory Data Analysis (EDA) and visualization	Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)
	Data wrangling	Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration	Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data
	Data infrastructure	Handling databases, web-services, Hadoop, multi-source data	Data structures, SOAP protocols, ontologies, XML, JSON, streaming
Analysis Methods	Statistical inference	Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling	Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression
	Study design and diagnostics	Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates	Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction
	Machine Learning	Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN	Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning

Topics

The Data Science and Predictive Analytics textbook is divided into the following 23 chapters, each progressively building on the previous content.

Motivation
Foundations of R
Managing Data in R
Data Visualization
Linear Algebra & Matrix Computing
Dimensionality Reduction
Lazy Learning: Classification Using Nearest Neighbors
Probabilistic Learning: Classification Using Naive Bayes
Decision Tree Divide and Conquer Classification
Forecasting Numeric Data Using Regression Models
Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
Apriori Association Rules Learning
k-Means Clustering
Model Performance Assessment
Improving Model Performance
Specialized Machine Learning Topics
Variable/Feature Selection
Regularized Linear Modeling and Controlled Variable Selection
Big Longitudinal Data Analysis
Natural Language Processing/Text Mining
Prediction and Internal Statistical Cross Validation
Function Optimization
Deep Learning, Neural Networks

Program Outline

Welcome and introductions
Course logistics (please come prepared with access to Internet connected computers having local versions of R (statistical computing environment) and RStudio (graphical user interface and integrated development environment)
Data manipulation and visualization
Non-linear dimensionality reduction (UMAP & t-SNE)
Supervised and Unsupervised, model-based and model-free prediction, regression, classification, and clustering
Reticulation (Interoperability between R, Python, C/C++ and other languages)
Role of optimization in AI/ML
Activities and HTML5 demos.

Program Details

Wednesday, June 16, 2021, 8:00-11:00 AM US-EDT	Thursday, June 17, 2021, , 8:00-11:00 AM US-EDT
Welcome	Review of Day 1
DSPA Summer Course Overview (ISI/WSC, prereqs, vision, objectives, outcomes, Website)	Questions, comments, issues?
Introductions (Instructor: Ivo Dinov; Attendees: please post in Chat/Discussion-Forum: Participant's Name, Affiliation, Title, interests, and one fun fact about you	Supervised AI
Course Coverage	Model-based
Expectations and optional capstone project (below)	Baseball players physique modeling
SOCR Resources: Datasets & Case-studies, Webapps, DSPA, Spacekime/TCIU, GitHub, Prob & Stats EBook, SMHS EBook, Current SOCR Users	k-NN prediction of galaxy spin
Open Science It’s online, therefore it exists!	Model-free
Download DSPA Textbook (free)	Estimate the square root function using NN
Resource Search & Navigation, Language Translations
	NN Google Trends and the Stock Market
Motivation - and 7D of Big Data	Unsupervised AI
Digitalization of all human experiences	Classification and clustering (k-Means, spectral, hierarchical)
Rresponsible Data Science/Ethical Predictive Analytics	Hot-dogs example
R vs. Python vs. SAS vs. SPSS vs. other SW	Silhouette plots
Confirm local installations of R & RStudio	Pediatric trauma clustering study
RStudio GUI
Rmarkdown Notebook (IDE) End-to-end Pipeline Workflow from raw data … models … visualization … analytics … reporting/pubs
Example Demo (requires knitr package)
Chapter 4 RMD Source, HTML output, SOCR_Header
Math Foundations
5-min Break	5-min Break
Data types: categorical & numeric, structured and unstructured, scalar, vector, matrix, data-frame, tensor, list, object	Reticulation (interoperability between R, Python, C/C++ and other languages)
Data manipulation import/export, EM imputation, webpage scraping, sample statistics (moments)	Text modeling & NLP (sentiment analysis example)
EDA (visualization)
Compare R EDA vs. HTML/JS: SOCRAT (NI data of AD/MCI/NC), Motion Charts (Housing Prices), BrainViewer (raw MRI, DTI tracks, Brain Atlas)
Probability Distributions: Distributome, TVN Webapp	Longitudinal data analysis (Google trends analytics)
Dimensionality reduction
Linear PCA: 2D --> 1D example, PPMI (Parkinson's disease) example
5-min Break	5-min Break
Non-linear: MNIST data OCR: UMAP OCR, t-SNE OCR	Role of optimization in AI/ML (Healthcare manufacturer product optimization example)
SOCR/Tensorboard/Projector UKBB Brain Study	Deep neural networks (image-classification example)
Capstone project: interactive-learning using monthly US macro-economic data. Use the RMD source, the example HTML output, and the provided data to experiment with some of the DSPA techniques. Think of ways to augment these data (expand the time range and increase the feature richness)	DSPA Appendices: Bayesian Simulation, Modeling and Inference; Information-Theoretic Foundation of Statistical Learning; Surface, Shape, and Manifold Representation and Visualization; Power Analysis in Experimental Design; Database SQL/NoSQL Queries & Google BigQuery; Image Convolution, Filtering, & Fourier Transform; Causality, Transfer Entropy, & Mechanistic Effects; Agent-based Reinforcement Learning
	Demonstrations of interesting Capstone project results
Open discussion	Open discussion

Resources

Video Recordings

Participants

Partial list of participants:

Jennifer Daniels: Adjunct Math and Statistics Instructor at Mid Michigan College, Davenport University, and Alma College. Graduate student at Central Michigan University.
Jo Edwards: Australian Bureau of Statistics, Project Manager/Data Scientist.
Jannik Schaller: Federal Statistical Office of Germany (DESTATIS), Interest: Data Fusion/ Statistical and Machine Learning.
Edviges Coelho: Statistics Portugal and Universidade Lusófona.
Kadri Rootalu: Data scientist in Statistics Estonia, but have an education in Sociology
Jared Mendoza: University of the Philippines Los Banos, Assistant Professor of Statistics
Lynda Aouar: UNCO, PhD student in Applied Statistics, I am interested about nonparametric statistics
Ananda Manage: Dept of Math & Stat, Sam Houston State University, Texas, USA
Michal Ciszewski, PhD student in Statistics at TU Delft, interests: activity recognition and anomaly detection
Joyce Chang; Data scientist at the U of Pittsburgh School of Medicine; interested in risk prediction modeling and identify heterogeneous treatment effects
Katherine Zavez: PhD student in the Department of Statistics at the University of Connecticut
Ewilly Liew: Lecturer in Econometrics and Business Statistics, Monash University Malaysia. Interest: behavioral research in higher education and healthcare.
Jennifer Daniels: Interested in Applied Statistics. Particularly, Data/Text Mining. I have lived and taught in Japan many years ago.
Elizabeth Gonzalez: Statistics Department, Colegio de Postgraduados, Mexico, interested in statistical inference in general.
Delia Ortega: PhD student in Statistics. Universidad Nacional, Colombia.
Li Zhou: PhD student in stat at Auburn University
Ilich Lama: Principal Research Scientist - Environmental Data Science (NCASI), Montreal, Canada - Interested among other things in statistical analysis of industrial emissions/releases.
Brocha Stern, postdoctoral fellow at Northwestern University, orthopedic health services and outcomes research
Annette Kifley, biostatistician in rehabilitation studies, University of Sydney
Jo Edwards: I am interested in Coding and Classification techniques as well as Entity extraction
Nur Aziha Mansor: Statistician in Department of Statistics Malaysia, Interest in data management
Martina Ozoglu: Statistical Office of the Slovak Republic, tourism analyst. I am interested in new forms of Tourism and its data interpretation.
Jason Ng, Monash University, Dept of Econometrics and Business Statistics
Quratulain Khaliq: PhD Statistics candidate from Pakistan
Malcolm Cai: Working in the public service of Singapore. Keen on data science, and sports.

Nurhazwani Abdul Halim, an Executive from Data Management and Statistics Department, from Central Bank of Malaysia. I am interested in Data Science and Machine Learning
Zsófia Szente: Hungarian Central Statistical Office, statistician. I am interested in data visualization and data science.
Luigi Arzedi, PhD student in Statistics at University of Cagliari (Italy)
Miguel David Alvarez, PhD student in Economics and I work as a Data Scientist in the National Electoral Institute (Mexico).
Felibel Zabala: methodologist from Stats NZ. I am interested in data science & machine learning in official statistics
Quratulain Khaliq: PhD Candidate, Allama Iqbal open University, Statistical process Control, Robustness technique, non parametric statistics. I am interested to link SPC techniques to data science.

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

Difference between revisions of "SOCR News ISI WSC DSPA Training 2021"

Latest revision as of 13:27, 3 March 2022

Contents

SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA)

Instructor

Session Logistics

Overview

Prerequisites

Vision

Values

Strategic priorities

Outcomes

Topics

Program Outline

Program Details

Resources

Video Recordings

Participants

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 1: / Line 1: @@
 == [[SOCR_News | SOCR News & Events]]:  2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA) ==
-[[Image:DSPA_2021.gif|250px|thumbnail|right| [https://www.isi2021.org/ 2021 ISI World Statistical Congress] ]]
+[[Image:DSPA_ISI_WSC_anime.gif|right| [https://www.isi2021.org/ 2021 ISI World Statistical Congress] ]]
-==Overview==
+== Instructor ==
-....
+: [https://umich.edu/~dinov Ivo Dinov], [https://www.umich.edu University of Michigan], [https://www.socr.umich.edu SOCR], [https://midas.umich.edu MIDAS].
+:: Dr. Dinov is a professor of Health Behavior and Biological Sciences and Computational Medicine and Bioinformatics at the University of Michigan. He is a member of the Michigan Center for Applied and Interdisciplinary Mathematics (MCAIM) and a core member of the University of Michigan Comprehensive Cancer Center. Dr. Dinov serves as Director of the Statistics Online Computational Resource, Co-Director of the Center for Complexity and Self-management of Chronic Disease (CSCD Center), Co-Director of the multi-institutional Probability Distributome Project, Associate Director of the Michigan Institute for Data Science (MIDAS), and Associate Director of the Michigan Neuroscience Graduate Program (NGP). He is a member of the American Statistical Association (ASA), International Association for Statistical Education (IASE), American Mathematical Society (AMS), American Association for the Advancement of Science (AAAS), and an Elected Member of the International Statistical Institute (ISI).
-== Organizer==
-* [http://umich.edu/~dinov Ivo Dinov], [https://www.umich.edu University of Michigan], [https://www.socr.umich.edu SOCR], [https://midas.umich.edu MIDAS].
 ==Session Logistics==
 <!-- [[Image:JMM_2021_SS9_FoundationsOf_DS_Background.png|300px|thumbnail|right| [https://jointmathematicsmeetings.org/meetings/national/jmm2021/2247_program_ss9.html 2021 JMM/AMS Foundations of Data Science Session (SS9A)] ]] -->
-* '''Date/Time''': Wednesday & Thursday, June 16-17, 2021, 14.00-17.00, Central European Summer Time, [https://www.timeanddate.com/time/zones/cest CEST (UTC+2)]
+* '''Date/Time''': Wednesday & Thursday, June 16-17, 2021, 14.00-17.00, Central European Summer Time, [https://www.timeanddate.com/time/zones/cest CEST (UTC+2)], 8:00-11:00 AM [https://www.timeanddate.com/time/zones/et US-EDT].
-* '''Registration''': TBD.
+* '''Registration''': [https://www.isi-web.org/events/courses/short-2021/data-science-and-predictive-analytics-dspa Registration Link], [https://www.isi-web.org/events/courses/short-2021 moderate registration fees apply].
-* '''URL''': TBD.
+* '''GoToMeeting''': [https://global.gotowebinar.com/pjoin/1464080410602344976/5342046927112771343 Webinar link].
-* '''Conference''': [https://www.isi2021.org/ 2021 ISI World Statistical Congress].
+* '''URL''': [https://www.isi-web.org/events/courses/short-2021/data-science-and-predictive-analytics-dspa Official ISI/WSC Course Website].
-* '''Session Format''':  Daily 3-hour sessions.
+* '''Conference''': [https://www.isi2021.org/ 2021 ISI World Statistical Congress] and [https://www.isi-web.org/events/courses/short-2021 WSC 2021 short courses].
-* [https://myumi.ch/qgRl1 Session URL]: https://myumi.ch/qgRl1.
+* '''Session Format''':  Two daily sessions (3-hours each).
+* [https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_WSC_DSPA_Training_2021 Session URL]: https://myumi.ch/erXm2.
+== Overview==
+This course will be based on a [https://www.socr.umich.edu/people/dinov/DSPA_Courses.html Data Science and Predictive Analytics (DSPA) course] I teach at the University of Michigan. The training will provide intermediate to advanced learners with a solid data science foundation to address challenges related to collecting, managing, processing, interrogating, analyzing and interpreting complex health and biomedical datasets using R. Participants will gain skills and acquire a tool-chest of methods, software tools, and protocols that can be applied to a broad spectrum of Big Data problems.
+Before diving into the mathematical algorithms, statistical computing methods, software tools, and health analytics, we will discuss a number of driving motivational problems. These will ground all the subsequent scientific discussions, data modeling, and computational approaches.
+===Prerequisites===
+Assumed [https://www.socr.umich.edu/people/dinov/courses/DSPA_Prereqs.html prior knowledge includes]: Completed undergraduate study with quantitative STEM exposure, some quantitative training, programming experience, and high-level of energy and motivation to learn. Preinstalled [http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#21_Install_Basic_Shell-based_R R] and [http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#22_GUI_based_R_Invocation_(RStudio) RStudio] on user local client computer.
+===Vision===
+This course is based on active-learning and integrates driving motivational challenges with mathematical foundations, computational statistics, and modern scientific inference.
+===Values===
+The training aims to provide effective, reliable, reproducible, and transformative data-driven discovery supporting open-science.
+===Strategic priorities===
+Trainees will develop scientific intuition, computational skills, and data-wrangling abilities to tackle Big biomedical and health data problems. Instructors will provide well-documented R-scripts and software recipes implementing atomic data-filters as well as complex end-to-end predictive big data analytics solutions.
-== Program==
+===Outcomes===
+Upon successful completion of this course, participants are expected to have moderate competency in at least two of each of the three competency areas: Algorithms and Applications, Data Management, and Analysis Methods. Specifically, participants will get end-to-end R-protocols, gain ML/AI algorithm knowledge, explore data validation, wrangling, and visualization, experiment with statistical inference and model-free Machine Learning tools.
-<center>
 {| class="wikitable"
+! Areas !! Competency !! Expectation !! Notes
 |-
-! Time [https://www.timeanddate.com/time/zones/mt US MT timezone (GMT-7)] || Presenter/Affiliation || Title || Abstract ID
+| rowspan="3"|Algorithms and Applications || Tools || Working knowledge of basic software tools (command-line, GUI based, or web-services) || Familiarity with statistical programming languages, e.g., R or SciKit/Python, and database querying languages, e.g., SQL or NoSQL
 |-
-| 8:00AM || [https://www.carolineuhler.com/ Caroline Uhler (MIT)] || ''Multi-Domain Data Integration: From Observations to Mechanistic Insights'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-32.pdf Abstract 1163-62-32]
+| Algorithms || Knowledge of core principles of scientific computing, applications programming, API’s, algorithm complexity, and data structures || Best practices for scientific and application programming, efficient implementation of matrix linear algebra and graphics, elementary notions of computational complexity, user-friendly interfaces, string matching
 |-
-| 8:30AM || [https://luddy.indiana.edu/contact/profile/?profile_id=187 Mehmet (Memo) Dalkilic (Indiana University)] || ''Teaching an Old Dog New Tricks: Making EM work with Big Data using Heaps'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-03-86.pdf Abstract 1163-03-86]
+|  Application Domain || Data analysis experience from at least one application area, either through coursework, internship, research project, etc. || Applied domain examples include: computational social sciences, health sciences, business and marketing, learning sciences, transportation sciences, engineering and physical sciences
 |-
-| 9:00AM || [https://www.math.fsu.edu/People/faculty.php?id=1783 Tom Needham (Florida State University)] || ''Applications of Gromov-Wasserstein distance to network science'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-52-68.pdf Abstract 1163-52-68]
+| rowspan="3"|Data Management || Data validation & visualization || Curation, Exploratory Data Analysis (EDA) and visualization || Data provenance, validation, visualization via histograms, Q-Q plots, scatterplots (ggplot, Dashboard, D3.js)
 |-
-| 9:30AM || [http://www.cs.utah.edu/~jeffp/ Jeff M. Phillips (Utah)] || ''A Primer on the Geometry in Machine Learning'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-52-52.pdf Abstract 1163-52-52]
+| Data wrangling || Skills for data normalization, data cleaning, data aggregation, and data harmonization/registration  || Data imperfections include missing values, inconsistent string formatting (‘2016-01-01’ vs. ‘01/01/2016’, PC/Mac/Linux time vs. timestamps, structured vs. unstructured data
 |-
-| 10:00AM || [https://www.jonathannilesweed.com/ Jonathan Niles-Weed, NYU/Courant/Center for Data Science] || ''Statistical estimation under group actions'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-41.pdf Abstract 1163-62-41]
+| Data infrastructure || Handling databases, web-services, Hadoop, multi-source data || Data structures, SOAP protocols, ontologies, XML, JSON, streaming
 |-
-| 10:30 AM || [https://ani.stat.fsu.edu/~abarbu/ Adrian Barbu (Florida State University)] || ''A Novel Framework for Online Supervised Learning with Feature Selection'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-50.pdf Abstract 1163-62-50]
+| rowspan="3"|Analysis Methods || Statistical inference || Basic understanding of bias and variance, principles of (non)parametric statistical inference, and (linear) modeling || Biological variability vs. technological noise, parametric (likelihood) vs non-parametric (rank order statistics) procedures, point vs. interval estimation, hypothesis testing, regression
 |-
-| 11:00 AM || [https://arsuaga-vazquez-lab.faculty.ucdavis.edu/team-details/maxime-pouokam/ Maxime G Pouokam (UC Davis)] || ''Statistical Topology of Genome Analysis in Three Dimensions'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-338.pdf Abstract 1163-62-338]
+| Study design and diagnostics || Design of experiments, power calculations and sample sizing, strength of evidence, p-values, False Discovery Rates || Multistage testing, variance normalizing transforms, histogram equalization, goodness-of-fit tests, model overfitting, model reduction
 |-
-| 11:30 AM || [https://www.umich.edu/~dinov/ Ivo D. Dinov (University of Michigan)] || ''Data Science, Time Complexity, and Spacekime Analytics'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-33.pdf Abstract 1163-62-33]
+| Machine Learning || Dimensionality reduction, k-nearest neighbors, random forests, AdaBoost, kernelization, SVM, ensemble methods, CNN || Empirical risk minimization. Supervised, semi-supervised, and unsupervised learning. Transfer learning, active learning, reinforcement learning, multiview learning, instance learning
 |}
-</center>
-==Speakers, Titles, and Abstracts==
+== Topics ==
+The [https://socr.umich.edu/people/dinov/DSPA_Courses.html Data Science and Predictive Analytics] textbook is divided into the [https://en.wikipedia.org/wiki/Data_Science_and_Predictive_Analytics following 23 chapters], each progressively building on the previous content.
+<div style="column-count:3;-moz-column-count:3;-webkit-column-count:3">
+# Motivation
+# Foundations of R
+# Managing Data in R
+# Data Visualization
+# Linear Algebra & Matrix Computing
+# Dimensionality Reduction
+# Lazy Learning: Classification Using Nearest Neighbors
+# Probabilistic Learning: Classification Using Naive Bayes
+# Decision Tree Divide and Conquer Classification
+# Forecasting Numeric Data Using Regression Models
+# Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
+# Apriori Association Rules Learning
+# k-Means Clustering
+# Model Performance Assessment
+# Improving Model Performance
+# Specialized Machine Learning Topics
+# Variable/Feature Selection
+# Regularized Linear Modeling and Controlled Variable Selection
+# Big Longitudinal Data Analysis
+# Natural Language Processing/Text Mining
+# Prediction and Internal Statistical Cross Validation
+# Function Optimization
+# Deep Learning, Neural Networks
+</div>
-* [https://www.carolineuhler.com/ Caroline Uhler (MIT)]: ''Multi-Domain Data Integration: From Observations to Mechanistic Insights'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-32.pdf Abstract 1163-62-32])
+== Program Outline==
-: Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (manufacturing, advertisement, education, genomics, etc.). In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (video, audio, interventional, observational, etc.). Using genomics and in particular the problem of identifying drugs for the repurposing against COVID-19 as an example, I will first discuss our recent work on coupling autoencoders in the latent space to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We end by a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, I will characterize the implicit bias of overparameterized autoencoders and show that such networks trained using standard optimization methods implement associative memory. Collectively, our results have major implications for planning and learning from interventions in various application domains.
-* [https://luddy.indiana.edu/contact/profile/?profile_id=187 Mehmet (Memo) Dalkilic (Indiana University)]: ''Teaching an Old Dog New Tricks: Making EM work with Big Data using Heaps'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-03-86.pdf Abstract 1163-03-86])
+* Welcome and introductions
-: Contemporary data mining algorithms are easily overwhelmed with truly big data.  While parallelism, improved initialization, and ad hoc data reduction are commonly used and necessary strategies, we note that (1) continually revisiting data and (2) visiting all data are two of the most prominent problems–especially for iterative learning techniques like expectation-maximization  algorithm  for  clustering  (EM-T).  To  the  best  of  our  knowledge,  there  is  no  freely  available software that specifically focuses on improving the original EM-T algorithm in the context of big data.  We demonstrate the  utility  of  CRAN  package  ''DCEM''  that  implements  an  improved  version  of  EM-T, which  we  call  EM*  (EM  star). DCEM provides an integrated and minimalistic interface to EM-T and EM* algorithms, and can be used as either (1) a stand-alone program or (2) a pluggable component in existing software.  We show that EM* can both effectively and efficiently cluster data as we vary size, dimensions, and separability.
+* Course logistics (please come prepared with access to Internet connected computers having local versions of [http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#21_Install_Basic_Shell-based_R R (statistical computing environment)] and [http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#22_GUI_based_R_Invocation_(RStudio) RStudio (graphical user interface and integrated development environment)]
+* Data manipulation and visualization
+* Non-linear dimensionality reduction (UMAP & t-SNE)
+* Supervised and Unsupervised, model-based and model-free prediction, regression, classification, and clustering
+* Reticulation (Interoperability between R, Python, C/C++ and other languages)
+* Role of optimization in AI/ML
+* Activities and [https://socr.umich.edu/HTML5/ HTML5 demos].
-* [https://www.math.fsu.edu/People/faculty.php?id=1783 Tom Needham (Florida State University)]: ''Applications of Gromov-Wasserstein distance to network science'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-52-68.pdf Abstract 1163-52-68])
+== Program Details ==
-: Recent  years  have  seen  a  surge  of  research  activity  in  network  analysis  through  the  lens  of  optimal  transport.   This perspective boils down to the following simple idea:  when comparing two networks, instead of considering a traditional registration between their nodes, one instead searches for an optimal ‘soft’ or probabilistic correspondence.  This perspective has led to state-of-the-art algorithms for robust large-scale network alignment and network partitioning tasks.  A rich mathematical theory underpins this work:  optimal node correspondences realize the Gromov-Wasserstein (GW) distance between networks.  GW distance was originally introduced, independently by K. T. Sturm and Facundo M ́emoli, as a tool for studying abstract convergence properties of sequences of metric measure spaces.  In particular, Sturm showed that GW distance can be understood as a geodesic distance with respect to a Riemannian structure on the space of isomorphism classes of metric measure spaces (the ‘Space of Spaces’).  In this talk, I will describe joint work with Samir Chowdhury,in which we develop computationally efficient implementations of Sturm’s ideas for network science applications.  We also derive theoretical results which link this framework to classical notions from spectral network analysis.
+{| class="wikitable"
+|-
+! Wednesday, June 16, 2021, 8:00-11:00 AM US-EDT
+! Thursday, June 17, 2021, , 8:00-11:00 AM US-EDT
+|-
+|  Welcome
+|  Review of Day 1
+|-
+|  [https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_WSC_DSPA_Training_2021 DSPA Summer Course] Overview ([https://www.isi-web.org/events/courses/short-2021/data-science-and-predictive-analytics-dspa ISI]/[https://www.isi2021.org/ WSC], [https://www.socr.umich.edu/people/dinov/courses/DSPA_Prereqs.html prereqs], vision, objectives, outcomes, Website)
+|  Questions, comments, issues?
+|-
+|  Introductions ([https://umich.edu/~dinov Instructor: Ivo Dinov]; Attendees: please post in Chat/Discussion-Forum: Participant's Name, Affiliation, Title, interests, and ''one fun fact about you''
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/06_LazyLearning_kNN.html Supervised AI]
+|-
+|  [https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_WSC_DSPA_Training_2021#Topics Course Coverage]
+| Model-based
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/00_Motivation.html#13_DSPA_Expectations Expectations] and optional [https://umich.instructure.com/courses/38100/files/folder/Case_Studies/34_US_MacroEconMarketData_CompleteMonthly_1979_2020 capstone project] (below)
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/09_RegressionForecasting.html#3_Case_Study_1:_Baseball_Players Baseball players physique modeling]
+|-
+|  [https://www.socr.umich.edu/ SOCR Resources]: [https://wiki.socr.umich.edu/index.php/SOCR_Data Datasets] & [https://umich.instructure.com/courses/38100/files/folder/Case_Studies Case-studies], [https://socr.umich.edu/HTML5/ Webapps], [https://dspa.predictive.space/ DSPA], [https://spacekime.org/ Spacekime/TCIU], [https://github.com/SOCR GitHub], [https://wiki.socr.umich.edu/index.php/EBook Prob & Stats EBook], [https://wiki.socr.umich.edu/index.php/SMHS SMHS EBook], [https://www.socr.umich.edu/html/SOCR_UserGoogleMap.html Current SOCR Users]
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/06_LazyLearning_kNN.html#4_Case_Study:_Predicting_Galaxy_Spins k-NN prediction of galaxy spin]
+|-
+|  Open Science It’s online, therefore it exists!
+| Model-free
+|-
+|  [https://link.springer.com/book/10.1007%2F978-3-319-72347-1 Download DSPA Textbook] (free)
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/10_ML_NN_SVM_Class.html#3_Simple_NN_demo_-_learning_to_compute_(sqrt_{___}) Estimate the square root function using NN]
+|-
+|  Resource [https://www.socr.umich.edu/people/dinov/courses/DSPA_Topics.html Search] & [https://www.socr.umich.edu/html/Navigators.html Navigation], [https://translate.google.com Language Translations]
+|
+|-
+|
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/10_ML_NN_SVM_Class.html#4_Case_Study_2:_Google_Trends_and_the_Stock_Market_-_Classification NN Google Trends and the Stock Market]
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/00_Motivation.html Motivation] - and [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/00_Motivation.html#7_Common_Characteristics_of_Big_(Biomedical_and_Health)_Data 7D of Big Data]
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/06_LazyLearning_kNN.html Unsupervised AI]
+|-
+| Digitalization of all human experiences
+|  Classification and clustering (k-Means, spectral, hierarchical)
+|-
+| R[https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/00_Motivation.html#12_Responsible_Data_Science_and_Ethical_Predictive_Analytics responsible Data Science/Ethical Predictive Analytics]
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/12_kMeans_Clustering.html#1_Clustering_as_a_machine_learning_task Hot-dogs example]
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#1_Why_use_R R vs. Python vs. SAS vs. SPSS vs. other SW]
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/12_kMeans_Clustering.html#2_Silhouette_plots Silhouette plots]
+|-
+|  Confirm local installations of R & RStudio
+| [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/12_kMeans_Clustering.html#6_Case_study_2:_Pediatric_Trauma Pediatric trauma clustering study]
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/01_Foundation.html#23_RStudio_GUI_Layout RStudio GUI]
+|
+|-
+|  Rmarkdown Notebook (IDE) End-to-end Pipeline Workflow from raw data … models … visualization … analytics … reporting/pubs
+|
+|-
+|  Example Demo (requires knitr package)
+|
+|-
+| Chapter 4 [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/04_LinearAlgebraMatrixComputing.Rmd RMD Source], [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/04_LinearAlgebraMatrixComputing.html HTML output], [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/SOCR_header.html SOCR_Header]
+|
+|-
+|  [https://socr.umich.edu/BPAD/BPAD_notes/Biophysics430_Chap01_MathFoundations.html Math Foundations]
+|
+|-
+|  5-min Break
+|  5-min Break
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/02_ManagingData.html#1_Saving_and_Loading_R_Data_Structures       Data types]: categorical & numeric, structured and unstructured, scalar, vector, matrix, data-frame, tensor, list, object
+|  Reticulation ([https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/15_SpecializedML_FormatsOptimization.html#7_R_Notebook_support_for_other_programming_languages interoperability between R, Python, C/C++ and other languages])
+|-
+|  Data manipulation import/export, [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/02_ManagingData.html#143_Imputation_via_Expectation-Maximization EM imputation], [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/02_ManagingData.html#15_Parsing_webpages_and_visualizing_tabular_HTML_data webpage scraping], [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/02_ManagingData.html#6_Measuring_the_Central_Tendency_-_mean,_median,_mode sample statistics (moments)]
+|  Text modeling & NLP ([https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/19_NLP_TextMining.html#5_Sentiment_analysis sentiment analysis example])
+|-
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/03_DataVisualization.html EDA (visualization)]
+|
+|-
+|  Compare R EDA vs. HTML/JS: [https://socr.umich.edu/HTML5/SOCRAT/ SOCRAT (NI data of AD/MCI/NC)], [https://socr.umich.edu/HTML5/MotionChart/ Motion Charts (Housing Prices)], [https://socr.umich.edu/HTML5/BrainViewer/ BrainViewer (raw MRI, DTI tracks, Brain Atlas)]
+|
+|-
+|  Probability Distributions: [http://distributome.org/V3/ Distributome], [https://socr.umich.edu/HTML5/BivariateNormal/TVN/ TVN Webapp]
+|  Longitudinal data analysis (Google trends analytics)
+|-
+|  Dimensionality reduction
+|
+|-
+| Linear PCA: [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/05_DimensionalityReduction.html#1_Example:_Reducing_2D_to_1D 2D --> 1D example], [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/05_DimensionalityReduction.html#5_Principal_Component_Analysis_(PCA) PPMI (Parkinson's disease) example]
+|
+|-
+|  5-min Break
+|  5-min Break
+|-
+|  Non-linear: MNIST data OCR: [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/05_DimensionalityReduction.html#103_Hand-Written_Digits_Recognition UMAP OCR],  [https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/05_DimensionalityReduction.html#92_t-SNE_Example:_Hand-written_Digit_Recognition t-SNE OCR]
+|  Role of optimization in AI/ML ([https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/21_FunctionOptimization.html#101_Application:_Healthcare_Manufacturer_Product_Optimization Healthcare manufacturer product optimization example])
+|-
+|  [https://socr.umich.edu/HTML5/SOCR_TensorBoard_UKBB/ SOCR/Tensorboard/Projector UKBB Brain Study]
+|  Deep neural networks ([https://www.socr.umich.edu/people/dinov/courses/DSPA_notes/22_DeepLearning.html image-classification example])
+|-
+|  [https://umich.instructure.com/courses/38100/files/folder/Case_Studies/34_US_MacroEconMarketData_CompleteMonthly_1979_2020 Capstone project]: interactive-learning using monthly US macro-economic data. Use the [https://umich.instructure.com/files/20798411/download?download_frd=1 RMD source], the [https://umich.instructure.com/files/20798410/download?download_frd=1 example HTML output], and the [https://umich.instructure.com/files/20026184/download?download_frd=1 provided data] to experiment with some of the DSPA techniques. Think of ways to augment these data (expand the time range and increase the feature richness)
+|  [https://www.socr.umich.edu/people/dinov/courses/DSPA_Topics.html#Appendix DSPA Appendices]: Bayesian Simulation, Modeling and Inference; Information-Theoretic Foundation of Statistical Learning; Surface, Shape, and Manifold Representation and Visualization; Power Analysis in Experimental Design; Database SQL/NoSQL Queries & Google BigQuery; Image Convolution, Filtering, & Fourier Transform; Causality, Transfer Entropy, & Mechanistic Effects; Agent-based Reinforcement Learning
+|-
+|
+|  Demonstrations of interesting [https://umich.instructure.com/courses/38100/files/folder/Case_Studies/34_US_MacroEconMarketData_CompleteMonthly_1979_2020 Capstone project] results
+|-
+|  Open discussion
+|  Open discussion
+|}
-* [http://www.cs.utah.edu/~jeffp/ Jeff M. Phillips (Utah)]: ''A Primer on the Geometry in Machine Learning'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-52-52.pdf Abstract 1163-52-52])
+==Resources==
-: Machine  Learning  is  a  discipline  filled  with  many  simple  geometric  algorithms,  the  central  task  of  which  is  usually classification.  These varied approaches all take as input a set of n points in d dimensions, each with a label.  In learning,the goal is to use this input data to build a function which predicts a label accurately on new data drawn from the same unknown distribution as the input data.  The main difference in the many algorithms is largely a result of the chosen class of functions considered.  This talk will take a quick tour through many approaches from simple to complex and modern,and show the geometry inherent at each step.  Pit stops will include connections to geometric data structures, duality,random projections, range spaces, and core sets.
+* [https://socr.umich.edu/docs/uploads/2021/DSPA_ISI_WSC_Flyer_2021.pdf Course Flyer].
+* [https://wiki.socr.umich.edu/images/5/5c/ISI_WSC_2021_DSPA_Course_June_2021_Notes.pdf 1-page Course Coverage with dynamic links to content].
+* [https://en.wikipedia.org/wiki/Data_Science_and_Predictive_Analytics DSPA Wikipedia].
+* [https://www.springer.com/us/book/9783319723464 DSPA Springer Page] & [http://link.springer.com/978-3-319-72347-1 SpringerLink (PDF Download)].
+* [https://dspa.predictive.space/ dspa.predictive.space] & [https://umich.instructure.com/courses/143011/ DSPA MOOC Canvas Site].
+* [https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_DSPA_Training_2022 Three-day follow up 2022 ISI DSPA Course].
-* [https://www.jonathannilesweed.com/ Jonathan Niles-Weed, NYU/Courant/Center for Data Science]: ''Statistical estimation under group actions'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-41.pdf Abstract 1163-62-41])
+==Video Recordings==
-: A common challenge in the sciences is the presence of heterogeneity in data. Motivated by problems in signal processing and computational biology, we consider a particular form of heterogeneity where observations are corrupted by random transformations from a group (such as the group of permutations or rotations) before they can be collected and analyzed. We establish the fundamental limits of statistical estimation in such settings and show that the optimal rates of recovery are precisely governed by the invariant theory of the group. As a corollary, we establish rigorously the number of samples necessary to reconstruct the structure of molecules in cryo-electron microscopy. We also give a computationally efficient algorithm for a special case of this problem, and discuss conjectured statistical-computational gaps for the general case.
+* [https://attendee.gotowebinar.com/recording/5279492413447473676 Day 1 Video podcast].
-: Based on joint work with Afonso Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Philippe Rigollet, Amit Singer, and Alex Wein.
+* [https://attendee.gotowebinar.com/recording/4162355614300329995 Day 2 Video podcast].
-* [https://ani.stat.fsu.edu/~abarbu/ Adrian Barbu (Florida State University)]: ''A Novel Framework for Online Supervised Learning with Feature Selection'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-50.pdf Abstract 1163-62-50])
+==Participants==
-: Current  online  learning  methods  suffer  from  lower  convergence  rates  and  limited  capability  to  recover  the  support  of the  true  features  compared  to  their  offline  counterparts.   In  this  work,  we  present  a  novel  online  learning  framework based on running averages and introduce online versions of some popular existing offline methods such as Elastic Net, Minimax Concave Penalty and Feature Selection with Annealing.  The framework can handle an arbitrarily large number of observations as long as the data dimension is not too large,  e.g.  p<50,000.  We prove the equivalence between our online methods and their offline counterparts and give theoretical true feature recovery and convergence guarantees for some  of  them.   In  contrast  to  the  existing  online  methods,  the  proposed  methods  can  extract  models  of  any  sparsity level at any time.  Numerical experiments indicate that our new methods enjoy high accuracy of true feature recovery and  a  fast  convergence  rate,  compared  with  standard  online  and  offline  algorithms.   We  also  show  how  the  running averages framework can be used for model adaptation in the presence of model drift.  Finally, we present applications to large datasets where again the proposed framework shows competitive results compared to popular online and offline algorithms.
+''Partial list of participants:''
-* [https://arsuaga-vazquez-lab.faculty.ucdavis.edu/team-details/maxime-pouokam/ Maxime G Pouokam (UC Davis)]: ''Statistical Topology of Genome Analysis in Three Dimension'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-338.pdf Abstract 1163-62-338])
+* Jennifer Daniels: Adjunct Math and Statistics Instructor at Mid Michigan College, Davenport University, and Alma College. Graduate student at Central Michigan University.
-: The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is an important factor for several cellular  functions,  including  gene  expression  regulation,  and  has  also  been  linked  with  many  diseases  such  as  cancer-causing translocation events.  Recent adaptations of high-throughput sequencing to chromosome conformation capture (3C) techniques, allows for genome-wide structural characterization for the first time with a goal of getting a 3D structure of the genome.  In this study, we present a novel approach to compute entanglement in open chains in general and apply it to chromosomes.  Our metric is termed the linking proportion (Lp).  We use the Lp in two different settings.  We use the Lp to show that the Rabl configuration, an evolutionary conserved feature of the 3D nuclear organization, as an essential player in the simplification of the entanglement of chromatin fibers.  We show how the Lp incorporates statistical models of inference that can be used to determine the agreement between candidate 3D configuration reconstructions. In the last part of our work, we present Smooth3D, a novel 3D genome reconstruction method via cubic spline approximation.
+* Jo Edwards: Australian Bureau of Statistics, Project Manager/Data Scientist.
+* Jannik Schaller: Federal Statistical Office of Germany (DESTATIS), Interest: Data Fusion/ Statistical and Machine Learning.
-* [https://www.umich.edu/~dinov/ Ivo D. Dinov (University of Michigan)]: ''Data Science, Time Complexity, and Spacekime Analytics'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-33.pdf Abstract 1163-62-33])
+* Edviges Coelho: Statistics Portugal and Universidade Lusófona.
-: Human behavior, communication, and social interactions are profoundly augmented by the rapid immersion of digitalization and virtualization of all life experiences. This process presents important challenges of managing, harmonizing, modeling, analyzing, interpreting, and visualizing complex information. There is a substantial need to develop, validate, productize, and support novel mathematical techniques, advanced statistical computing algorithms, transdisciplinary tools, and effective artificial intelligence applications. ''Spacekime analytics'' is a new technique for modeling high-dimensional longitudinal data. This approach relies on extending the notions of time, events, particles, and wavefunctions to complex-time (''kime''), complex-events (''kevents''), data, and inference-functions. We will illustrate how the kime-magnitude (longitudinal time order) and kime-direction (phase) affect the subsequent predictive analytics and the induced scientific inference. The mathematical foundation of spacekime calculus reveal various statistical implications including inferential uncertainty and a Bayesian formulation of spacekime analytics. Complexifying time allows the lifting of all commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacekime manifold, where a number of interesting mathematical problems arise. Direct data science applications of spacekime analytics will be demonstrated using simulated data and clinical observations (e.g., sMRI, fMRI data).
+* Kadri Rootalu: Data scientist in Statistics Estonia, but have an education in Sociology
+* Jared Mendoza: University of the Philippines Los Banos, Assistant Professor of Statistics
-==Resources==
+* Lynda Aouar: UNCO, PhD student in Applied Statistics, I am interested about nonparametric statistics
-Slides/papers
+* Ananda Manage: Dept of Math & Stat, Sam Houston State University, Texas, USA
+* Michal Ciszewski, PhD student in Statistics at TU Delft, interests: activity recognition and anomaly detection
+* Joyce Chang; Data scientist at the U of Pittsburgh School of Medicine; interested in risk prediction modeling and identify heterogeneous treatment effects
+* Katherine Zavez: PhD student in the Department of Statistics at the University of Connecticut
+* Ewilly Liew: Lecturer in Econometrics and Business Statistics, Monash University Malaysia. Interest: behavioral research in higher education and healthcare.
+* Jennifer Daniels: Interested in Applied Statistics. Particularly, Data/Text Mining. I have lived and taught in Japan many years ago.
+* Elizabeth Gonzalez: Statistics Department, Colegio de Postgraduados, Mexico, interested in statistical inference in general.
+* Delia Ortega: PhD student in Statistics. Universidad Nacional, Colombia.
+* Li Zhou: PhD student in stat at Auburn University
+* Ilich Lama: Principal Research Scientist - Environmental Data Science (NCASI), Montreal, Canada - Interested among other things in statistical analysis of industrial emissions/releases.
+* Brocha Stern, postdoctoral fellow at Northwestern University, orthopedic health services and outcomes research
+* Annette Kifley, biostatistician in rehabilitation studies, University of Sydney
+* Jo Edwards: I am interested in Coding and Classification techniques as well as Entity extraction
+* Nur Aziha Mansor: Statistician in Department of Statistics Malaysia, Interest in data management
+* Martina Ozoglu: Statistical Office of the Slovak Republic, tourism analyst. I am interested in new forms of Tourism and its data interpretation.
+* Jason Ng, Monash University, Dept of Econometrics and Business Statistics
+* Quratulain Khaliq: PhD Statistics candidate from Pakistan
+* Malcolm Cai: Working in the public service of Singapore. Keen on data science, and sports.
-: [https://wiki.socr.umich.edu/images/4/42/AdrianBarbu_Slides-2021-01-09-JMM.pdf (Adrian Barbu) ''A Novel Framework for Online Supervised Learning with Feature Selection''].
+* Nurhazwani Abdul Halim, an Executive from Data Management and Statistics Department, from Central Bank of Malaysia. I am interested in Data Science and Machine Learning
-: [https://wiki.socr.umich.edu/images/f/fb/JMM_MaximePouokam_UCD_2021.pdf (Maxime Pouokam) ''Statistical Topology of Genome Analysis in Three Dimension''].
+* Zsófia Szente: Hungarian Central Statistical Office, statistician. I am interested in data visualization and data science.
-: [https://socr.umich.edu/docs/uploads/2021/Dinov_Spacekime_JMM_AMS_2021.pdf (Ivo Dinov) ''Data Science, Time Complexity, and Spacekime Analytics'' (Presentation Slides)].
+* Luigi Arzedi, PhD student in Statistics at University of Cagliari (Italy)
+* Miguel David Alvarez, PhD student in Economics and I work as a Data Scientist in the National Electoral Institute (Mexico).
+* Felibel Zabala: methodologist from Stats NZ. I am interested in data science & machine learning in official statistics
+* Quratulain Khaliq: PhD Candidate, Allama Iqbal open University, Statistical process Control, Robustness technique, non parametric statistics. I am interested to link SPC techniques to data science.
 <hr>
-{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_News_ISI_WSC_DSPA_Training_2021}}
+{{translate|pageName=https://wiki.socr.umich.edu/index.php/SOCR_News_ISI_WSC_DSPA_Training_2021}}