SOCR News APS GDS ShortCourse March 2022
Contents
SOCR News & Events: APS GDS Short Course – March Meeting 2022
Logistics
- Conference: 2022 American Physical Society (APS) March Meeting | Group on Data Science (GDS) | APS GDS Course page
- Contacts
- GDS Program Chair: Maria Longobardi, University of Basel, Switzerland
- Organizer: Ivo D. Dinov, University of Michigan
- APS Coordinators: Vinaya Sathyasheelappa; Cynthia Smith
- Short-Course Title: Longitudinal Data Tensor-Linear Modeling and Space-kime Analytics
- Event Date/Time: March 13, 2022, 9:00 - 17:00 (US Central Time)
- Format: Online (distance-based, virtual). Instruction will involve a blend of theoretical foundations, computational implementations, and data-driven applications
- Audience: Prerequisites: prior knowledge of college level math, physics, and statistics
- Registration: preregistration is required for participation (space is capped to 30)
- Registration Fees: Students $80, post-docs/fellows $120, regular $150
- Need-based Fee Waivers: APS Data Science Group (GDS) will select and cover the registration fee for up to 5 students and trainees. Interested trainees need to complete this additional fee-waiver request form justifying their need for a waiver. There are no guarantees for waivers, but the organizers are committed to increase trainee participation from traditionally STEM-underrepresented communities.
- Short URL: https://myumi.ch/G1411.
- Joining the Event on March 13: To join, log in the meeting site and click on My Registrations link in the left menu list. There, you should see Zoom Registration Link 1; these links will not be live until the date of the event (March 13); ZM Link.
Course Summary
In many scientific domains, there is a rapid increase of the volume, sampling rate, and heterogeneity of the acquired information. This amplifies the role of higher order tensors for modeling, processing, analysis and data-driven inference. The blend of repeated experiments and time dynamics of some data elements necessitates the development of novel data science methods, powerful machine learning techniques, and automated artificial intelligence tools. This short course will cover the current state-of-the-art approaches for tensor-based linear modeling and space-kime analytics. We will present a generalized framework for modeling and prediction of scalar, matrix, or tensor outcomes from observed tensor inputs. In addition, we will demonstrate the complex-time (kime) representation of longitudinal data, where the temporal event order is generalized to the (unordered) complex plane. This generalization transformed classical time-series to 2D kime-surfaces. Various biomedical and health applications will be showcased.
Program
Morning Session (9:00-12:00 US Central Time, GMT-6) |
Afternoon Session (13:00-17:00 US Central Time, GMT-6) | ||||
Time |
Presenter |
Topic |
Time |
Presenter |
Topic |
9:00-9:15 |
Ivo Dinov |
Welcome & Overview |
13:00-13:45 |
Presenter 3 Raj Guhaniyogi (Talk) |
Topic 3 |
9:15-10:00 |
Presenter 1 Maryam Bagherian (Talk) |
Topic 1 |
13:45-14:15 |
Presenter 3 Raj Guhaniyogi (Demo) |
Topic 3 |
10:00-10:30 |
Presenter 1: Maryam Bagherian (Demo) |
Topic 1 |
14:15-14:20 |
Break | |
10:30-10:45 |
Break |
14:25-15:10 |
Presenter 4 Anru Zhang (Talk) |
Topic 4 | |
10:45-11:30 |
Presenter 2 Miaoyan Wang (Talk) |
Topic 2 |
15:10-15:40 |
Presenter 4 Anru Zhang (Demo) |
Topic 4 |
11:30-12:00 |
Presenter 2 Miaoyan Wang and Chanwoo Lee (Demo) |
Topic 2 |
15:40-15:45 |
Break | |
12:00-13:00 |
Break (lunch recess) |
15:45-16:30 |
Presenter 5 Ivo Dinov (Talk) |
Topic 5 | |
16:30-17:00 |
Presenter 5 Ivo Dinov (Demo) |
Topic 5 | |||
17:00-17:05 |
Conclusions/Adjourn |
Presentations
Topic1
- Presenter: Maryam Bagherian
- Title: Tensor Methods under Distance Metric Learning Constraints: Completion, Decomposition, Recovery and Reconstruction
- Abstract: Rapid growth in quantity and variety of data presents enormous opportunities in AI data-driven inference and decision-support. In practice, data complexity often exceeds the capacity of current matrix based data representations which limits the applications of many current analysis algorithms. Tensors represent a powerful framework for modeling, analysis and visualization of high-dimensional data and are capable of storing and tracking structural information in heterogeneous datasets. Tensor recovery and reconstruction have emerged as the key tools to investigate high-dimensional multi-modal noisy and partially observed datasets. To capture such complex data structures, we introduce a new approach to tensor methods using distance metric learning (DML). Metric learning constraints, implemented as bilevel optimization methods, ensure that this technique accurately estimates the tensor model factors, even in the presence of missing entries and noise.
- Slides
Topic2
- Presenter: Miaoyan Wang
- Title: Beyond matrices: higher-order tensor methods meet computational biology
- Abstract: Higher-order tensors arise frequently in applications such as neuroimaging, recommendation system, social network analysis, and psychological studies. Rapid developments in high-throughput technologies have made multiway data readily available in daily lives. Tensor provides a generalized data structure in many learning procedures. Methods built on tensors provide powerful tools to capture complex structures that lower-order methods fail to exploit. However, the empirical success has uncovered a myriad of new and pressing challenges. In this talk, I will discuss some recent advances and challenges in high-dimensional tensor data algorithms. Potentials of these methods are illustrated through applications to Human Connectome Project (HCP) and Genotype-Tissue Expression (GTEx) datasets.
Topic3
- Presenter: Raj Guhaniyogi
- Title: Bayesian High-dimensional Regressions with Tensors and Distributed Computation with Space-time Data
- Abstract: Of late, neuroscience, environmental science or related applications routinely encounter regression scenarios involving multidimensional array or tensor structured responses or predictors. In the first half of this course, we will discuss how to perform Bayesian regression with tensor response and/or predictors, the construction of prior distributions on tensor-valued parameters and posterior inference. We will present applications of the proposed methodology in brain activation and connectivity studies. The second half of this course will be devoted to the study of recently emerging literature on divide-and-conquer Bayesian inference in massive spatio-temporal data. We will discuss how to draw distributed Bayesian inference in space-time data with parallel computing architecture, theoretical studies in distributed approaches and applications in large scale environmental datasets.
Topic4
- Presenter: Anru Zhang
- Title: High-dimensional Tensor Learning: Methodology, Theory, and Applications
- Abstract: The analysis of tensor data, i.e., arrays with multiple directions, has become an active research topic in the era of big data. Datasets in the form of tensors arise from a wide range of applications, such as neuroimaging, genomics, and computational imaging. Tensor methods also provide unique perspectives to many high-dimensional problems, where the observations are not necessarily tensors. Problems with high-dimensional tensors generally possess distinct characteristics that pose unprecedented challenges; there are strong demands to develop new methods for them.
- In this lecture, we specifically focus on how to perform singular value decomposition (SVD), a fundamental task of unsupervised learning, on general tensors or tensors with structural assumptions, e.g., sparsity, smoothness, and longitudinality. Through the developed frameworks, we can achieve accurate denoising for 4D scanning transmission electron microscopy images; in longitudinal microbiome studies, we can extract key components in the trajectories of bacterial abundance, identify representative bacterial taxa for these key trajectories, and group subjects based on the change of bacteria abundance over time. We also illustrate how we develop new methods that exploit useful information from high-dimensional tensor data based on the modern theories of computation and non-convex optimization.
Topic5
- Presenter: Ivo D. Dinov
- Title: Time Complexity, Tensor Modeling and Longitudinal Spacekime Analytics
- Abstract: Many observable processes demand managing, harmonizing, modeling, analyzing, interpreting, and visualizing of large and complex information. Spacekime analytics uses complex time for modeling high-dimensional longitudinal data. This approach relies on extending the notions of time, events, particles, and wavefunctions to complex-time (kime), complex-events (kevents), data, and inference-functions. We will illustrate how the kime-magnitude (longitudinal time order) and kime-direction (phase) affect the subsequent predictive analytics and the induced scientific inference. The mathematical foundation of spacekime calculus reveal various statistical implications including inferential uncertainty, tensor linear modeling, and a Bayesian formulation of spacekime analytics. Complexifying time allows the lifting of all commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacekime manifold, where a number of interesting mathematical problems arise. Direct data science applications of spacekime analytics will be demonstrated using simulated data and clinical observations (e.g., structural and functional MRI). Joint work with Milen V. Velev (Burgas University, Bulgaria) and Yueyang Shen (University of Michigan).
- Slides and Demos.
Instructors
Maryam Bagherian, University of Michigan, Welch Lab, MIDAS
- Dr. Bagherian is a Michigan Data Science Fellow and an expert in applied and computational mathematics. Her research is focused on developing ML/AI algorithms and data science methods, e.g., multidimensional multimodal big data modeling. The primary applications of her work are in biomedical data science, health informatics, and genomic studies. Dr. Bagherian has developed new online tensor recovery and decomposition methods for couple tensors with simultaneous auxiliary information. These techniques are applied to multi-omics, spatial transcriptomics, and genomics datasets. Dr. Bagherian's presentation.
Miaoyan Wang, Statistics, Wisconsin-Madison
- Dr. Miaoyan Wang is an assistant professor of statistics at UW-Madison. She is also a faculty affiliate in the Institute for Foundations of Data Science, a multi-University TRIPODS Phase II Initiative. Her research is in machine learning theory, nonparametric statistics, higher-order tensors, and applications to genetics. Her interdisciplinary research efforts have been reflected in her training. In 2015-2018, she was a postdoc at the Department of EECS at UC Berkeley and a Simons Math+X postdoc at University of Pennsylvania. She received a PhD in Statistics from the University of Chicago in 2015. She has won NSF CAREER award, a Best Student Paper Award (with her as advisor) from American Statistical Association, the Madison Teaching and Learning Excellence Fellow, and multiple prestigious young researcher awards in statistics, machine learning, and genetics. Joint work with Chanwoo Lee. Dr. Wang's Presentation.
Raj Guhaniyogi, TAMU
- Dr. Rajarshi Guhaniyogi received his PhD in Biostatistics at the University of Minnesota, Twin Cities, under the supervision of Dr. Sudipto Banerjee. He was a Postdoctoral Researcher with Dr. David B. Dunson in the Department of Statistical Science at Duke University prior to joining the Department of Statistics at UC Santa Cruz as an Assistant Professor in 2014. In 2021, Dr. Guhaniyogi was recruited as an Associate Professor in the Department of Statistics at Texas A&M University where he is developing massive dimensional parametric and non-parametric Bayesian methods motivated by improving practical performance in real world applications in batch and online data settings, using statistical theory to justify and guide the development of new methods. Dr. Guhaniyogi research interests lie broadly in development of Bayesian parametric and non-parametric methodology in complex biomedical and machine learning applications. His ongoing research focus is on scalable and distributed Bayesian inference for big data, dimensionality reduction, functional and object data (networks, tensor) analysis. Rajarshi draws his motivation from applications primarily from neuroscience, genetics, epidemiology, environmental science, forestry and social science. Rajarshi is a recipient of the 2016 University of California Hellman Fellowship. Dr. Guhaniyogi's presentation.
Anru Zhang, Duke
- Dr. Anru Zhang is Eugene Anson Stead, Jr. M.D. Associate Professor in the Department of Biostatistics & Bioinformatics and a secondary faculty in the Departments of Computer Science, Mathematics, and Statistical Science at Duke University. He was an assistant professor of statistics at the University of Wisconsin-Madison in 2015-2021. He obtained his bachelor’s degree from Peking University in 2010 and his Ph.D. from the University of Pennsylvania in 2015. His work focuses on high-dimensional statistical inference, non-convex optimization, statistical tensor analysis, computational complexity, and applications in genomics, microbiome, electronic health records, and computational imaging. He received the ASA Gottfried E. Noether Junior Award (2021), a Bernoulli Society New Researcher Award (2021), an ICSA Outstanding Young Researcher Award (2021), and an NSF CAREER Award (2020). Dr. Zhang's presentation.
Ivo D. Dinov, University of Michigan, SOCR, MIDAS
- Dr. Dinov is a professor of Health Behavior and Biological Sciences and Computational Medicine and Bioinformatics at the University of Michigan. He is a member of the Michigan Center for Applied and Interdisciplinary Mathematics (MCAIM) and a core member of the University of Michigan Comprehensive Cancer Center. Dr. Dinov serves as Director of the Statistics Online Computational Resource, Director of the Center for Complexity and Self-management of Chronic Disease (CSCD Center), Co-Director of the multi-institutional Probability Distributome Project, Associate Director of the Michigan Institute for Data Science (MIDAS), and Associate Director of the Michigan Neuroscience Graduate Program (NGP). He is a member of the American Physical Society (APS), American Statistical Association (ASA), International Association for Statistical Education (IASE), American Mathematical Society (AMS), American Association for the Advancement of Science (AAAS), and an Elected Member of the International Statistical Institute (ISI). Dr. Dinov's presentation.
Course evaluation
All participants are strongly encouraged to complete this anonymous course post-evaluation survey.
Resources
- 1-Page Course Flyer
- Course (Outline Slidedeck) Handout
- Topic 1 supplementary materials (Maryam Bagherian): Slides
- Topic 2 supplementary materials Dr. Wang's talk and demo: TensorComplete R-package.
- Topic 3 supplementary materials (Raj Guhaniyogi): Slidedeck and Demos
- Topic 4 supplementary materials (TBD)
- Topic 5 supplementary materials (Ivo Dinov): Slides and Demos.
Translate this page: