Difference between revisions of "SOCR News HDDA 2024"

From SOCR
Jump to: navigation, search
(Session Presenters)
(Session Presenters)
 
(7 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
* '''Title''': ''Data Science, Artificial Intelligence, and High-Dimensional Spatiotemporal Dynamics'' (Organizer: [https://www.socr.umich.edu/people/dinov/ Ivo Dinov (Michigan)])
 
* '''Title''': ''Data Science, Artificial Intelligence, and High-Dimensional Spatiotemporal Dynamics'' (Organizer: [https://www.socr.umich.edu/people/dinov/ Ivo Dinov (Michigan)])
 
* '''Date/Time''': August 28-30, 2024 (Singapore Time Zone)
 
* '''Date/Time''': August 28-30, 2024 (Singapore Time Zone)
* '''Venue''': [https://www.essec.edu/en/essec-asia-pacific-en/ ESSEC Business School, Asia-Pacific campus], Singapore
+
* '''Venue''': [https://www.essec.edu/en/essec-asia-pacific-en/ ESSEC Business School, Asia-Pacific campus, ESSEC Business School, 5 Nepal Park, Singapore 139408]
* '''Registration''': ... coming up ...
+
* '''Registration''': [https://essec.qualtrics.com/jfe/form/SV_25EV2CHly1zbwhg Registration] (capped at 60).
* '''Conference''': [https://sites.google.com/essec.edu/hdda-xiii/ 2024 High-Dimensional Data Analysis (HDDA-13)] (Organizer: [https://brocku.ca/mathematics-science/mathematics/directory/syed-ejaz-ahmed/ S. Ejaz Ahmed (Brock University)]).
+
* '''Conference''': [https://sites.google.com/essec.edu/hdda-xiii/ 2024 High-Dimensional Data Analysis (HDDA-13)]. Organizer: [https://brocku.ca/mathematics-science/mathematics/directory/syed-ejaz-ahmed/ S. Ejaz Ahmed (Brock University)].
* '''Session Format''':  Three talks (details coming up).
+
* '''Session Format''':  Three talks (details coming up)
 +
* '''Conference Program''': [https://pierrealquier.github.io/papers/hddaxiii_program.pdf Complete Program].
  
 
==Session Presenters==
 
==Session Presenters==
Line 21: Line 22:
 
: ''Abstract'': Deep neural networks have demonstrable ability to learn effective feature representations for state-of-art classifiers and regressors from data. From the perspective of data analytics, the task of extracting salient features from data shares similarities with the signal processing task of learning parsimonious and descriptive feature representations. When paired with sensitivity analysis methods from the explainable AI community, these form a powerful new toolbox to explore the complex associations embedded in high dimensional data. However, training deep neural networks on high dimensional data remains challenging with no guarantee of convergence to a good solution,presumably stymied by the curse of dimensionality. Deep neural network models tend to be overparameterized to facilitate convergence towards a good solution via gradient descent optimization. Often, it is also challenging practically to acquire sufficient high-quality labeled training data. This results in a sparse sampling of the high dimensional data space which introduces challenges to generalization. Training deep neural networks via supervised learning can be conceived as solving an under-determined system of nonlinear equations so model overparameterization and paucity of constraints from training data can be understood as limitations to converging the training of an accurate neural network model. In linear algebra, under-determined systems are solved by imposing additional constraints via regularization. Drawing inspiration from this, I discuss how recent advances in generative AI, adaptation of large foundation models and physics informed neural networks can be conceptualized as imposing additional constraints to ameliorate the challenge of sparse sampling in high dimensional data space. While generative AI attempts to learn additional constraints directly from the training data, transfer learning from large foundation models such as Low Rank Adaptation of Large Models attempt to borrow generalizable constraints from a different data domain whereas physics-informed neural networks impose constraints expressed as differential equations directly in the gradient descent training. Through case studies, I propose some practical approaches to apply these concepts and conclude with brief sharing on a method to reduce overparameterization of deep neural networks.
 
: ''Abstract'': Deep neural networks have demonstrable ability to learn effective feature representations for state-of-art classifiers and regressors from data. From the perspective of data analytics, the task of extracting salient features from data shares similarities with the signal processing task of learning parsimonious and descriptive feature representations. When paired with sensitivity analysis methods from the explainable AI community, these form a powerful new toolbox to explore the complex associations embedded in high dimensional data. However, training deep neural networks on high dimensional data remains challenging with no guarantee of convergence to a good solution,presumably stymied by the curse of dimensionality. Deep neural network models tend to be overparameterized to facilitate convergence towards a good solution via gradient descent optimization. Often, it is also challenging practically to acquire sufficient high-quality labeled training data. This results in a sparse sampling of the high dimensional data space which introduces challenges to generalization. Training deep neural networks via supervised learning can be conceived as solving an under-determined system of nonlinear equations so model overparameterization and paucity of constraints from training data can be understood as limitations to converging the training of an accurate neural network model. In linear algebra, under-determined systems are solved by imposing additional constraints via regularization. Drawing inspiration from this, I discuss how recent advances in generative AI, adaptation of large foundation models and physics informed neural networks can be conceptualized as imposing additional constraints to ameliorate the challenge of sparse sampling in high dimensional data space. While generative AI attempts to learn additional constraints directly from the training data, transfer learning from large foundation models such as Low Rank Adaptation of Large Models attempt to borrow generalizable constraints from a different data domain whereas physics-informed neural networks impose constraints expressed as differential equations directly in the gradient descent training. Through case studies, I propose some practical approaches to apply these concepts and conclude with brief sharing on a method to reduce overparameterization of deep neural networks.
  
* ...TBD...
+
* [https://www.socr.umich.edu/people/dinov/ Ivo D. Dinov], [https://www.socr.umich.edu/ Statistics Online Computational Resource], University of Michigan.
 +
 
 +
: ''Title'': AI and Spacekime Analytics in Health Research and Biomedical Inference
 +
: ''Abstract'': This talk will present a direct connection between quantum mechanical principles, data science foundations, AI, and statistical inference on repeated longitudinal data. By extending the concepts of time, events, particles, and wavefunctions to complex-time (kime), complex-events, data, and inference-functions, spacekime analytics provides a new foundation for representation, modeling, analyzing, and interpreting dynamic high-dimensional data. We will show the effects of kime-magnitude (longitudinal time order) and kime-phase (related to repeated random sampling) on the induced predictive AI analytics, forecasting, regression, and classification.
 +
 
 +
:The mathematical foundation of spacekime analytics also provides mechanisms to introduce spacekime calculus, expand Heisenberg’s uncertainty principle to reveal statistical implications of inferential uncertainty, and a develop a Bayesian formulation of spacekime inference. Lifting the dimension of time opens a number of challenging theoretical, experimental, and computational data science problems. It leads to a new representation of commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacekime manifold. Using simulated data and clinical observations (e.g., structural and functional MRI), we will demonstrate alternative strategies to transform time-varying processes (time-series) to kime-surfaces and show examples of spacekime analytics.
  
* [https://www.socr.umich.edu/people/dinov/ Ivo D. Dinov], [https://www.socr.umich.edu/ Statistics Online Computational Resource], University of Michigan.
+
: [https://wiki.socr.umich.edu/images/8/86/Dinov_Spacekime_2024_Slidedeck_HDDA.pdf Dinov's HDDA'24 Slidedeck].
 +
 
 +
* [https://www.oist.jp/research/makoto-yamada Makoto Yamada] (Okinawa Institute of Science and Technology)
 +
 
 +
: ''Title'': Approximating 1-Wasserstein distance with Trees and its application to KNN and self-supervised learning.
 +
 
 +
: ''Abstract'': The Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing and computer vision applications. One of the challenges in estimating the Wasserstein distance is that it is computationally expensive and does not scale well for many distribution-comparison tasks. In this study, we propose a regression-based approach for approximating the 1-Wasserstein distance by the ''tree-Wasserstein distance (TWD)'', where the TWD is a 1-Wasserstein distance with tree-based embedding that can be computed in linear time with respect to the number of nodes on a tree. We first apply the proposed method for nearest neighbor search problems in NLP tasks and then introduce to use TWD for self-supervised learning.
  
  
 
<hr>
 
<hr>
 
{{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SOCR_News_HDDA_2024}}
 
{{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SOCR_News_HDDA_2024}}

Latest revision as of 23:23, 27 August 2024

SOCR News & Events: 2024 HDDA Special Session on Data Science, Artificial Intelligence, and High-Dimensional Spatiotemporal Dynamics

Overview

Data analysis methods are rapidly evolving due to the significant expansion of the size and complexities of data and the proliferation of new technologies. From social media networks to public health, bioinformatics to personalized medicine, environmental studies to nanoscience, and even financial analysis, diverse domains are facing new challenges. This surge has not only been confined to academic research; rather, it has permeated the practical spheres of businesses and governmental entities. As a response to this evolving landscape, there is an imperative to craft novel algorithms that can effectively scale with the dimensions of these datasets. In parallel, the development of new theoretical tools is essential to comprehend the statistical properties inherent to these algorithms. Promising breakthroughs in this realm encompass techniques such as variable selection, penalized methods, and variational inference, marking the frontier of advancements in data analysis and interpretation.

Since its inception in 2011 at the Fields Institute in Toronto, HDDA gathers leading researchers in the area of high-dimensional statistics and data analysis. The objectives include: (1) to highlight and expand the breadth of existing methods in high-dimensional data analysis and their potential for the advance of both mathematical and statistical sciences, (2) to identify important directions for future research in the theory of regularization methods and variational inference, in algorithmic development, and in methodology for different application areas, facilitate collaboration between theoretical and subject-area researchers (econometrics, finance, social science, biostatistics), and (3) to provide opportunities for highly qualified personnel to meet and interact with leading researchers in the area.

Session Logistics

Session Presenters

Title: A survey of opportunities through case studies of Generative AI, adaptation of Large Foundation Models and Physics Informed Neural Networks for high dimensional data analysis
Abstract: Deep neural networks have demonstrable ability to learn effective feature representations for state-of-art classifiers and regressors from data. From the perspective of data analytics, the task of extracting salient features from data shares similarities with the signal processing task of learning parsimonious and descriptive feature representations. When paired with sensitivity analysis methods from the explainable AI community, these form a powerful new toolbox to explore the complex associations embedded in high dimensional data. However, training deep neural networks on high dimensional data remains challenging with no guarantee of convergence to a good solution,presumably stymied by the curse of dimensionality. Deep neural network models tend to be overparameterized to facilitate convergence towards a good solution via gradient descent optimization. Often, it is also challenging practically to acquire sufficient high-quality labeled training data. This results in a sparse sampling of the high dimensional data space which introduces challenges to generalization. Training deep neural networks via supervised learning can be conceived as solving an under-determined system of nonlinear equations so model overparameterization and paucity of constraints from training data can be understood as limitations to converging the training of an accurate neural network model. In linear algebra, under-determined systems are solved by imposing additional constraints via regularization. Drawing inspiration from this, I discuss how recent advances in generative AI, adaptation of large foundation models and physics informed neural networks can be conceptualized as imposing additional constraints to ameliorate the challenge of sparse sampling in high dimensional data space. While generative AI attempts to learn additional constraints directly from the training data, transfer learning from large foundation models such as Low Rank Adaptation of Large Models attempt to borrow generalizable constraints from a different data domain whereas physics-informed neural networks impose constraints expressed as differential equations directly in the gradient descent training. Through case studies, I propose some practical approaches to apply these concepts and conclude with brief sharing on a method to reduce overparameterization of deep neural networks.
Title: AI and Spacekime Analytics in Health Research and Biomedical Inference
Abstract: This talk will present a direct connection between quantum mechanical principles, data science foundations, AI, and statistical inference on repeated longitudinal data. By extending the concepts of time, events, particles, and wavefunctions to complex-time (kime), complex-events, data, and inference-functions, spacekime analytics provides a new foundation for representation, modeling, analyzing, and interpreting dynamic high-dimensional data. We will show the effects of kime-magnitude (longitudinal time order) and kime-phase (related to repeated random sampling) on the induced predictive AI analytics, forecasting, regression, and classification.
The mathematical foundation of spacekime analytics also provides mechanisms to introduce spacekime calculus, expand Heisenberg’s uncertainty principle to reveal statistical implications of inferential uncertainty, and a develop a Bayesian formulation of spacekime inference. Lifting the dimension of time opens a number of challenging theoretical, experimental, and computational data science problems. It leads to a new representation of commonly observed processes from the classical 4D Minkowski spacetime to a 5D spacekime manifold. Using simulated data and clinical observations (e.g., structural and functional MRI), we will demonstrate alternative strategies to transform time-varying processes (time-series) to kime-surfaces and show examples of spacekime analytics.
Dinov's HDDA'24 Slidedeck.
Title: Approximating 1-Wasserstein distance with Trees and its application to KNN and self-supervised learning.
Abstract: The Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing and computer vision applications. One of the challenges in estimating the Wasserstein distance is that it is computationally expensive and does not scale well for many distribution-comparison tasks. In this study, we propose a regression-based approach for approximating the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where the TWD is a 1-Wasserstein distance with tree-based embedding that can be computed in linear time with respect to the number of nodes on a tree. We first apply the proposed method for nearest neighbor search problems in NLP tasks and then introduce to use TWD for self-supervised learning.





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif