Difference between revisions of "SOCR News ISI WSC DSPA Training 2021"

Jump to: navigation, search
(Created page with "== SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA) == Image:DSPA_2021.gif|250px|thumbnail|...")
(Replaced content with "== SOCR News & Events: 2021 ISI/WSC Training and Education Bootcamp on Data Science and Predictive Analytics (DSPA) == Image:DSPA_2021.gif|250px|thumbn...")
(Tag: Replaced)
Line 21: Line 21:
== Program==
== Program==
{| class="wikitable"
! Time [https://www.timeanddate.com/time/zones/mt US MT timezone (GMT-7)] || Presenter/Affiliation || Title || Abstract ID
| 8:00AM || [https://www.carolineuhler.com/ Caroline Uhler (MIT)] || ''Multi-Domain Data Integration: From Observations to Mechanistic Insights'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-32.pdf Abstract 1163-62-32]
| 8:30AM || [https://luddy.indiana.edu/contact/profile/?profile_id=187 Mehmet (Memo) Dalkilic (Indiana University)] || ''Teaching an Old Dog New Tricks: Making EM work with Big Data using Heaps'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-03-86.pdf Abstract 1163-03-86]
| 9:00AM || [https://www.math.fsu.edu/People/faculty.php?id=1783 Tom Needham (Florida State University)] || ''Applications of Gromov-Wasserstein distance to network science'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-52-68.pdf Abstract 1163-52-68]
| 9:30AM || [http://www.cs.utah.edu/~jeffp/ Jeff M. Phillips (Utah)] || ''A Primer on the Geometry in Machine Learning'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-52-52.pdf Abstract 1163-52-52]
| 10:00AM || [https://www.jonathannilesweed.com/ Jonathan Niles-Weed, NYU/Courant/Center for Data Science] || ''Statistical estimation under group actions'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-41.pdf Abstract 1163-62-41]
| 10:30 AM || [https://ani.stat.fsu.edu/~abarbu/ Adrian Barbu (Florida State University)] || ''A Novel Framework for Online Supervised Learning with Feature Selection'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-50.pdf Abstract 1163-62-50]
| 11:00 AM || [https://arsuaga-vazquez-lab.faculty.ucdavis.edu/team-details/maxime-pouokam/ Maxime G Pouokam (UC Davis)] || ''Statistical Topology of Genome Analysis in Three Dimensions'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-338.pdf Abstract 1163-62-338]
| 11:30 AM || [https://www.umich.edu/~dinov/ Ivo D. Dinov (University of Michigan)] || ''Data Science, Time Complexity, and Spacekime Analytics'' || [http://www.ams.org/amsmtgs/2247_abstracts/1163-62-33.pdf Abstract 1163-62-33]
==Speakers, Titles, and Abstracts==
* [https://www.carolineuhler.com/ Caroline Uhler (MIT)]: ''Multi-Domain Data Integration: From Observations to Mechanistic Insights'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-32.pdf Abstract 1163-62-32])
: Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (manufacturing, advertisement, education, genomics, etc.). In order to obtain mechanistic insights from such data, a major challenge is the integration of different data modalities (video, audio, interventional, observational, etc.). Using genomics and in particular the problem of identifying drugs for the repurposing against COVID-19 as an example, I will first discuss our recent work on coupling autoencoders in the latent space to integrate and translate between data of very different modalities such as sequencing and imaging. I will then present a framework for integrating observational and interventional data for causal structure discovery and characterize the causal relationships that are identifiable from such data. We end by a theoretical analysis of autoencoders linking overparameterization to memorization. In particular, I will characterize the implicit bias of overparameterized autoencoders and show that such networks trained using standard optimization methods implement associative memory. Collectively, our results have major implications for planning and learning from interventions in various application domains.
* [https://luddy.indiana.edu/contact/profile/?profile_id=187 Mehmet (Memo) Dalkilic (Indiana University)]: ''Teaching an Old Dog New Tricks: Making EM work with Big Data using Heaps'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-03-86.pdf Abstract 1163-03-86])
: Contemporary data mining algorithms are easily overwhelmed with truly big data.  While parallelism, improved initialization, and ad hoc data reduction are commonly used and necessary strategies, we note that (1) continually revisiting data and (2) visiting all data are two of the most prominent problems–especially for iterative learning techniques like expectation-maximization  algorithm  for  clustering  (EM-T).  To  the  best  of  our  knowledge,  there  is  no  freely  available software that specifically focuses on improving the original EM-T algorithm in the context of big data.  We demonstrate the  utility  of  CRAN  package  ''DCEM''  that  implements  an  improved  version  of  EM-T, which  we  call  EM*  (EM  star). DCEM provides an integrated and minimalistic interface to EM-T and EM* algorithms, and can be used as either (1) a stand-alone program or (2) a pluggable component in existing software.  We show that EM* can both effectively and efficiently cluster data as we vary size, dimensions, and separability.
* [https://www.math.fsu.edu/People/faculty.php?id=1783 Tom Needham (Florida State University)]: ''Applications of Gromov-Wasserstein distance to network science'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-52-68.pdf Abstract 1163-52-68])
: Recent  years  have  seen  a  surge  of  research  activity  in  network  analysis  through  the  lens  of  optimal  transport.  This perspective boils down to the following simple idea:  when comparing two networks, instead of considering a traditional registration between their nodes, one instead searches for an optimal ‘soft’ or probabilistic correspondence.  This perspective has led to state-of-the-art algorithms for robust large-scale network alignment and network partitioning tasks.  A rich mathematical theory underpins this work:  optimal node correspondences realize the Gromov-Wasserstein (GW) distance between networks.  GW distance was originally introduced, independently by K. T. Sturm and Facundo M ́emoli, as a tool for studying abstract convergence properties of sequences of metric measure spaces.  In particular, Sturm showed that GW distance can be understood as a geodesic distance with respect to a Riemannian structure on the space of isomorphism classes of metric measure spaces (the ‘Space of Spaces’).  In this talk, I will describe joint work with Samir Chowdhury,in which we develop computationally efficient implementations of Sturm’s ideas for network science applications.  We also derive theoretical results which link this framework to classical notions from spectral network analysis.
* [http://www.cs.utah.edu/~jeffp/ Jeff M. Phillips (Utah)]: ''A Primer on the Geometry in Machine Learning'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-52-52.pdf Abstract 1163-52-52])
: Machine  Learning  is  a  discipline  filled  with  many  simple  geometric  algorithms,  the  central  task  of  which  is  usually classification.  These varied approaches all take as input a set of n points in d dimensions, each with a label.  In learning,the goal is to use this input data to build a function which predicts a label accurately on new data drawn from the same unknown distribution as the input data.  The main difference in the many algorithms is largely a result of the chosen class of functions considered.  This talk will take a quick tour through many approaches from simple to complex and modern,and show the geometry inherent at each step.  Pit stops will include connections to geometric data structures, duality,random projections, range spaces, and core sets.
* [https://www.jonathannilesweed.com/ Jonathan Niles-Weed, NYU/Courant/Center for Data Science]: ''Statistical estimation under group actions'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-41.pdf Abstract 1163-62-41])
: A common challenge in the sciences is the presence of heterogeneity in data. Motivated by problems in signal processing and computational biology, we consider a particular form of heterogeneity where observations are corrupted by random transformations from a group (such as the group of permutations or rotations) before they can be collected and analyzed. We establish the fundamental limits of statistical estimation in such settings and show that the optimal rates of recovery are precisely governed by the invariant theory of the group. As a corollary, we establish rigorously the number of samples necessary to reconstruct the structure of molecules in cryo-electron microscopy. We also give a computationally efficient algorithm for a special case of this problem, and discuss conjectured statistical-computational gaps for the general case.
: Based on joint work with Afonso Bandeira, Ben Blum-Smith, Joe Kileel, Amelia Perry, Philippe Rigollet, Amit Singer, and Alex Wein.
* [https://ani.stat.fsu.edu/~abarbu/ Adrian Barbu (Florida State University)]: ''A Novel Framework for Online Supervised Learning with Feature Selection'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-50.pdf Abstract 1163-62-50])
: Current  online  learning  methods  suffer  from  lower  convergence  rates  and  limited  capability  to  recover  the  support  of the  true  features  compared  to  their  offline  counterparts.  In  this  work,  we  present  a  novel  online  learning  framework based on running averages and introduce online versions of some popular existing offline methods such as Elastic Net, Minimax Concave Penalty and Feature Selection with Annealing.  The framework can handle an arbitrarily large number of observations as long as the data dimension is not too large,  e.g.  p<50,000.  We prove the equivalence between our online methods and their offline counterparts and give theoretical true feature recovery and convergence guarantees for some  of  them.  In  contrast  to  the  existing  online  methods,  the  proposed  methods  can  extract  models  of  any  sparsity level at any time.  Numerical experiments indicate that our new methods enjoy high accuracy of true feature recovery and  a  fast  convergence  rate,  compared  with  standard  online  and  offline  algorithms.  We  also  show  how  the  running averages framework can be used for model adaptation in the presence of model drift.  Finally, we present applications to large datasets where again the proposed framework shows competitive results compared to popular online and offline algorithms.
* [https://arsuaga-vazquez-lab.faculty.ucdavis.edu/team-details/maxime-pouokam/ Maxime G Pouokam (UC Davis)]: ''Statistical Topology of Genome Analysis in Three Dimension'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-338.pdf Abstract 1163-62-338])
: The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is an important factor for several cellular  functions,  including  gene  expression  regulation,  and  has  also  been  linked  with  many  diseases  such  as  cancer-causing translocation events.  Recent adaptations of high-throughput sequencing to chromosome conformation capture (3C) techniques, allows for genome-wide structural characterization for the first time with a goal of getting a 3D structure of the genome.  In this study, we present a novel approach to compute entanglement in open chains in general and apply it to chromosomes.  Our metric is termed the linking proportion (Lp).  We use the Lp in two different settings.  We use the Lp to show that the Rabl configuration, an evolutionary conserved feature of the 3D nuclear organization, as an essential player in the simplification of the entanglement of chromatin fibers.  We show how the Lp incorporates statistical models of inference that can be used to determine the agreement between candidate 3D configuration reconstructions. In the last part of our work, we present Smooth3D, a novel 3D genome reconstruction method via cubic spline approximation.
* [https://www.umich.edu/~dinov/ Ivo D. Dinov (University of Michigan)]: ''Data Science, Time Complexity, and Spacekime Analytics'' ([http://www.ams.org/amsmtgs/2247_abstracts/1163-62-33.pdf Abstract 1163-62-33])
: Human behavior, communication, and social interactions are profoundly augmented by the rapid immersion of digitalization and virtualization of all life experiences. This process presents important challenges of managing, harmonizing, modeling, analyzing, interpreting, and visualizing complex information. There is a substantial need to develop, validate, productize, and support novel mathematical techniques, advanced statistical computing algorithms, transdisciplinary tools, and effective artificial intelligence applications. ''Spacekime analytics'' is a new technique for modeling high-dimensional longitudinal data. This approach relies on extending the notions of time, events, particles, and wavefunctions to complex-time (''kime''), complex-events (''kevents''), data, and inference-functions. We will illustrate how the kime-magnitude (longitudinal time order) and kime-direction (phase) affect the subsequent predictive analytics and the induced scientific inference. The mathematical foundation of spacekime calculus reveal various statistical implications including inferential uncertainty and a Bayesian formulation of spacekime analytics. Complexifying time allows the lifting of all commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacekime manifold, where a number of interesting mathematical problems arise. Direct data science applications of spacekime analytics will be demonstrated using simulated data and clinical observations (e.g., sMRI, fMRI data).
: [https://wiki.socr.umich.edu/images/4/42/AdrianBarbu_Slides-2021-01-09-JMM.pdf (Adrian Barbu) ''A Novel Framework for Online Supervised Learning with Feature Selection''].
: [https://wiki.socr.umich.edu/images/f/fb/JMM_MaximePouokam_UCD_2021.pdf (Maxime Pouokam) ''Statistical Topology of Genome Analysis in Three Dimension''].
: [https://socr.umich.edu/docs/uploads/2021/Dinov_Spacekime_JMM_AMS_2021.pdf (Ivo Dinov) ''Data Science, Time Complexity, and Spacekime Analytics'' (Presentation Slides)].

Revision as of 12:40, 11 March 2021