Difference between revisions of "SOCR News ISI WSC IPS35 2019"
(→Implicit Bias in Big Data Analytics)
|Line 74:||Line 74:|
====Implicit Bias in Big Data Analytics ====
====Implicit Bias in Big Data Analytics ====
Revision as of 11:40, 8 April 2019
- 1 SOCR News & Events: International Statistics Institute (ISI)
- 2 2019 World Statistics Congress (WSC)
- 3 Materials
- 3.1 Abstracts
- 3.1.1 Predictive Analytics of Big Neuroscience Data
- 3.1.2 A Generative Deep Machine Modeling Framework for Hypothesis Testing and Comparing Neuroimaging Data
- 3.1.3 Phenotypes, Genotypes & Voxels: A playground next to a nuclear power plant
- 3.1.4 The Default Mode Network After 20 Years: Statistical Perspectives
- 3.1.5 Implicit Bias in Big Data Analytics
- 3.2 Short Bios
- 3.3 Materials and Resources
- 3.1 Abstracts
SOCR News & Events: International Statistics Institute (ISI)
2019 World Statistics Congress (WSC)
Invited Paper Session (IPS35): Imaging Statistics and Predictive Data Analytics
- International Statistics Institute (ISI),
- 2019 World Stats Congress (WSC),
- Session (IPS35): Imaging Statistics and Predictive Data Analytics, 2-hour session including five 20+5 minute talks
- WSC Program
- Date: TBD (18 – 23 August 2019)
- Format: 20-min talks + 5-min Q/A
- Place: Kuala Lumpur Convention Centre (Kuala Lumpur, Malaysia)
- Proceedings: The 2019 WSC proceedings will include titles, abstracts, and papers (6 pages)
- Organizer: Ivo Dinov, University of Michigan
- Chair: Tahir Ekin, Texas State University
- August 1 – November 1, 2018: All presenters must submit abstracts
- December 1, 2018 - May 31, 2019: All presenters must register and pay registration fees
- January 15, 2019 – April 15, 2019: All presenters must submit papers. More information on the paper submission process
- April 15, 2019: Presenters must submit their presentations (PPTX or PDF)
- August 18 – 23, 2019: All presenters must attend and present their papers at the congress.
Petabytes of imaging, clinical, biospecimen, genetics and phenotypic biomedical data are acquired annually. Tens-of-thousands of new methods and computational algorithms are developed and reported in the literature and thousands of software tools and data analytic services are introduced each year. This Imaging Statistics and Predictive Data Analytics session will include presentations of leading experts in biomedical imaging, computational neuroscience, and statistical learning focused on streamlining big biomedical data methodologies as well as techniques for management, aggregation, manipulation, computational modeling, and statistical inference. The session will blend innovative model-based and model-free techniques for representation, analysis and interpretation of large, heterogeneous, multi-source, incomplete and incongruent imaging and phenotypic data elements.
This session will be of interest to many theoretical statisticians and applied biomedical researchers for the following reasons:
- The digital revolution demands substantial quantitative skills, data-literacy, and analytical competence: Health science doctoral programs need to be revised and expanded to build basic-science (STEM) expertise, emphasize team-science, rely on holistic understanding of biomedical systems and health problems, and amplify dexterous abilities to handle, interrogate and interpret complex multisource information.
- The amount of newly acquired biomedical imaging data is increasing exponentially. This demands innovative statistical and computational strategies to aggregate, process, and interpret the deluge of imaging, clinical and phenotypic information.
- Trans-disciplinary training and inter-professional education is critical for ethical and collaborative research involving complex biomedical imaging and health conditions.
- Exploratory and predictive Big Data analytics is pivotally important and complementary to traditional hypothesis-driven confirmatory analyses.
Predictive Analytics of Big Neuroscience Data
This talk will present some of the Big Neuroscience Data research and education challenges and opportunities. Specifically, we will identify the core characteristics of complex neuroscience data, discuss strategies for data harmonization and aggregation, and show case-studies using large normal and pathological cohorts. Examples of methods that will be demonstrated include DataSifter (enabling secure sharing of data), compressive big data analytics (facilitating inference on multi-source heterogeneous datasets), and model-free prediction (forecasting of clinical features or derived computed phenotypes). Simulated data as well as clinical data (UK Biobank, Alzheimer’s Disease Neuroimaging Initiative, and amyotrophic lateral sclerosis case-studies) will be used for testing and validation of the techniques. In support of open-science, result reproducibility, and methodological improvements, all datasets, statistical methods, computational algorithms, and software tools are freely available online.
A Generative Deep Machine Modeling Framework for Hypothesis Testing and Comparing Neuroimaging Data
The challenge of making comparison of brain networks and multimodal brain imaging data between healthy and diseased cohorts lies in the high dimensionality of brain imaging data. To make statistically significant claims while avoiding false positives and false negatives, prohibitively large sample sizes are needed. This is the main disadvantage of the current framework for hypothesis testing. t-tests, Mann-Whitney and similar hypothesis testing methods using point statistics make model based assumptions of data that cluster around a mean value. Significance is ascertained after ascribing sufficiently improbable difference in means or variance between cohorts. Even when statistically significant differences can be ascribed, there is a lack of usable hypothesis that can grant insight into the nature of these differences. We propose a novel approach to comparing high dimensional brain imaging datasets such as brain networks. We suggest that deep learning algorithms could be applied to create generative models of the underlying dataset which is a type of hypothesis on the data from each cohort. Using different deep learning architectures, training algorithms, or different instances of trained networks, we can generate multiple hypothesis / generative models of the underlying datasets. A family of hypothesis / generative models of a given cohort dataset specifies a bound on possible hypothesis for the data. Collecting more brain datasets essentially prunes the hypothesis space and shrinks the boundaries of plausible models. We further propose that the family of generative models from different cohorts can be compared via measures of statistical dissimilarity using statistical distance metrics such as the Fisher Information metric. Generative models as hypothesis on datasets permit further interaction which allows researchers to learn the meaning of each hypothesis, thus adding value and insight to analysis.
Phenotypes, Genotypes & Voxels: A playground next to a nuclear power plant
Big data provide a playground for researchers to address extremely interesting and novel questions important to deriving a better understanding of both optimal and suboptimal brain health. However, the breadth of available information is also associated with considerable risks when not handled properly. Additionally, despite all of the data that is at our fingertips, sometimes covariate data are missing. This talk will discuss approaches to dealing with large numbers of variables involved in big data, and will address different strategies for inference testing (correction for multiple testing) in genetic, epigenetic, and neuroimaging data. Further, the talk will cover the different instances of missing covariate data and will describe approaches to deal with these missing data. Audience interaction will take place with short quizzes throughout the talk.
The Default Mode Network After 20 Years: Statistical Perspectives
In the neuroimaging literature, the default mode network (DMN) refers to a group of areas in the human cerebral cortex that consistently shows decreased activity in attention-demanding tasks and increased activity under resting-state with eyes-closed or with simple visual fixation. The discovery of DMN has boosted research interest in self-referential or intrinsic activity in the brain in both patients and healthy controls. Since 1997, related studies have mainly relied on the group-averaged responses or seed-based correlations to identify increased/decreased activity in the DMN areas. In this study, we conducted a resting-state experiment by considering the eyes-closed and eyes-open conditions, and by particularly analyzing the reproducible activity across subjects in the DMN areas (Areas 8, 9, 10, 20, 23, 24, 25, 31, 32, 39, 40 and the entorhinal cortex). The reproducible activity was estimated using the standardized intraclass-correlations (ICCs); the statistical thresholding of the ICC maps was done by considering the non-stationarity of on-going BOLD signals during the resting-state conditions. The DMN areas were parcellated according the JuBrain cytoarchitectonic atlas. Forty-nine right-handed adults (26 females, averaged age: 23.08±3.188 years) participated the resting-state task involving 4 min eyes-closed followed by 4 min eyes-open. The MRI scan was performed using a 3T MAGNETOM Skyra scanner and a standard 20-channel head-neck coil. The echo planar imaging (EPI) scans were acquired with parameters TR/TE = 2000 ms/30 ms, flip angle = 84°, 35 slices, slice thickness = 3.4 mm, FOV = 192 mm, and resolution 3x3x3.74 mm to cover the whole brain including the cerebellum. The results suggested that a variety of brain activity could be found in the DMN areas including short-term increased/decreased activity after the eyes-closed/open instructions. We suggest being cautious in using the DMN in cognitive and clinical studies.
Implicit Bias in Big Data Analytics
Ivo D. Dinov
Ivo D. Dinov is a professor of Health Behavior and Biological Sciences and Computational Medicine and Bioinformatics at the University of Michigan. He directs the Statistics Online Computational Resource and co-directs the Center for Complexity and Self-management of Chronic Disease (CSCD) and the multi-institutional Probability Distributome Project. Dr. Dinov is an Associate Director of the Michigan Institute for Data Science (MIDAS). He is a member of the American Statistical Association (ASA), the International Association for Statistical Education (IASE), the American Medical Informatics Association (AMIA), as well as an Elected Member of the International Statistical Institute (ISI).
Eric Tatt Wei Ho
Eric Ho Tatt Wei is a senior lecturer at Universiti Teknologi Petronas, as well as a Malaysia Node Coordinator of the International Neuroinformatics Coordinating Facility (INCF). He received his MSc and PhD degrees in Electrical Engineering from Stanford University, USA. During his PhD, he developed an automated robotic system to manipulate fruit flies for live brain imaging. Back in Malaysia, he worked on the development of microfluidic devices for monitoring and manipulating blood cells for immunotherapy in low resource settings. His current research interests are in developing novel tools at the intersection of deep learning, brain sciences and networks and he is applying these techniques to characterize the effect of addiction and aging on the brain as well as to enhance the efficacy of brain interventions to improve cognition
Dr. Qiu is Dean’s Chair Associate Professor at Department of Biomedical Engineering and Clinical Imaging Research Centre at National University of Singapore. She is also a principal investigator at Singapore Institute for Clinical Sciences of Agency for Science Technology and Research (A*STAR). Dr. Qiu received her BS in Biomedical Engineering from Tsinghua University in 1999, MS degrees in Biomedical Engineering and Applied Mathematics and Statistics from University of Connecticut in 2002 and from the Johns Hopkins University in 2005, respectively. She obtained her PhD degree at the Johns Hopkins University in 2006. After one-year postgraduate training, she joined the National University of Singapore as assistant professor and launched her own Laboratory for Medical Image Data Sciences at both the Faculty of Engineering and the School of Medicine. Dr. Qiu has been devoted to innovation in computational analyses of complex and informative datasets comprising of disease phenotypes, neuroimage, and genetic data to understand the origins of individual differences in health throughout the lifespan. She received Faculty Young Research Award, 2016 Young Researcher Award of NUS. She has recently been appointed as endowed “Dean’s Chair” associate professor to honor her outstanding research achievements. She serves on the program committee of Organization of Human Brain Mapping and editor of the journals Neuroimage and Frontiers in Neuroscience.
Michelle Liou received her PhD in quantitative psychology from University of Pittsburgh, and has expertise in medical statistics, functional BOLD signal and EEG oscillations with an emphasis on image/signal processing and scientific inference. She and her lab members initiated the concept of ‘reproducibility’ for bridging functional MRI techniques and scientific inference, and won the 2003 New Perspective in fMRI Research Award from fMRIDC at the Dartmouth College, USA. She also won the Outstanding Research Award from the National Science Council (Taiwan) in 1999 and 2003. She is currently a senior research fellow at the Institute of Statistical Science, Academia Sinica, and visiting professor in the Translational Imaging Research Center, Taipei Medical University. In the past ten years, Dr. Liou has been invited to give talks on the importance of ‘reproducibility’ in brain research in China, Korea, Russia, Singapore, Taiwan, and USA.
S. Ejaz Ahmed
Prof. Ahmed is Dean of the Brock University School of Mathematics and Statistics. Prior to that, he was Head of Mathematics at the University of Windsor and University of Regina. He is a fellow of American Statistical Association and an Elected Member of the International Statistical Institute. Dr. Ahmed is an expert in statistical inference, shrinkage estimation, and asymptotic theory. He serves on the editorial boards of many statistical journals and served as a Board of Director and Chairman of the Education Committee of the Statistical Society of Canada. Dr. Ahmed is a member of an Evaluation Group, Discovery Grants and the Grant Selection Committee, Natural Sciences and Engineering Research Council of Canada (NSERC).
Materials and Resources
Predictive Analytics of Big Neuroscience Data
- SOCR Home page: http://www.socr.umich.edu
Translate this page: