Scientific Methods for Health Sciences
The Scientific Methods for Health Sciences EBook is still under active development. When the EBook is complete this banner will be removed.
Contents
- 1 SOCR Wiki: Scientific Methods for Health Sciences
- 2 Preface
- 3 Chapter I: Fundamentals
- 3.1 Exploratory Data Analysis, Plots and Charts
- 3.2 Ubiquitous Variation
- 3.3 Parametric Inference
- 3.4 Probability Theory
- 3.5 Odds Ratio/Relative Risk
- 3.6 Probability Distributions
- 3.7 Resampling and Simulation
- 3.8 Design of Experiments
- 3.9 Intro to Epidemiology
- 3.10 Experiments vs. Observational Studies
- 3.11 Estimation
- 3.12 Hypothesis Testing
- 3.13 Statistical Power, Sensitivity and Specificity
- 3.14 Data Management
- 3.15 Bias and Precision
- 3.16 Association and Causality
- 3.17 Rate-of-change
- 3.18 Clinical vs. Statistical Significance
- 3.19 Correction for Multiple Testing
- 4 Chapter II: Applied Inference
- 4.1 Epidemiology
- 4.2 Correlation and Regression (ρ and slope inference, 1-2 samples)
- 4.3 ROC Curve
- 4.4 ANOVA
- 4.5 Non-parametric inference
- 4.6 Instrument Performance Evaluation: Cronbach's α
- 4.7 Measurement Reliability and Validity
- 4.8 Survival Analysis
- 4.9 Decision Theory
- 4.10 CLT/LLNs – limiting results and misconceptions
- 4.11 Association Tests
- 4.12 Bayesian Inference
- 4.13 PCA/ICA/Factor Analysis
- 4.14 Point/Interval Estimation (CI) – MoM, MLE
- 4.15 Study/Research Critiques
- 4.16 Common mistakes and misconceptions in using probability and statistics, identifying potential assumption violations, and avoiding them
- 5 Chapter III: Linear Modeling
- 5.1 Multiple Linear Regression (MLR)
- 5.2 Generalized Linear Modeling (GLM)
- 5.3 Analysis of Covariance (ANCOVA)
- 5.4 Multivariate Analysis of Variance (MANOVA)
- 5.5 Multivariate Analysis of Covariance (MANCOVA)
- 5.6 Repeated measures Analysis of Variance (rANOVA)
- 5.7 Partial Correlation
- 5.8 Time Series Analysis
- 5.9 Fixed, Randomized and Mixed Effect Models
- 5.10 Hierarchical Linear Models (HLM)
- 5.11 Multi-Model Inference
- 5.12 Mixture Modeling
- 5.13 Surveys
- 5.14 Longitudinal Data
- 5.15 Generalized Estimating Equations (GEE) Models
- 5.16 Model Fitting and Model Quality (KS-test)
- 6 Chapter IV: Special Topics
- 6.1 Scientific Visualization
- 6.2 PCOR/CER methods Heterogeneity of Treatment Effects
- 6.3 Big-Data/Big-Science
- 6.4 Missing data
- 6.5 Genotype-Environment-Phenotype associations
- 6.6 Medical imaging
- 6.7 Data Networks
- 6.8 Adaptive Clinical Trials
- 6.9 Databases/registries
- 6.10 Meta-analyses
- 6.11 Causality/Causal Inference, SEM
- 6.12 Classification methods
- 6.13 Time-series analysis
- 6.14 Scientific Validation
- 6.15 Geographic Information Systems (GIS)
- 6.16 Rasch measurement model/analysis
- 6.17 MCMC sampling for Bayesian inference
- 6.18 Network Analysis
SOCR Wiki: Scientific Methods for Health Sciences
Electronic book (EBook) on Scientific Methods for Health Sciences (under development ...)
Preface
The Scientific Methods for Health Sciences (SMHS) EBook is designed to support a 4-course training curriculum emphasizing the fundamentals, applications and practice of scientific methods specifically for graduate students in the health sciences.
Format
Follow the instructions in this page to expand, revise or improve the materials in this EBook.
Learning and Instructional Usage
This section describes the means of traversing, searching, discovering and utilizing the SMHS EBook resources in both formal and informal learning setting.
Copyrights
The SMHS EBook is a freely and openly accessible electronic book developed by SOCR and the general health sciences community.
Chapter I: Fundamentals
Exploratory Data Analysis, Plots and Charts
Review of data types, exploratory data analyses and graphical representation of information.
Ubiquitous Variation
There are many ways to quantify variability, which is present in all natural processes.
Parametric Inference
Foundations of parametric (model-based) statistical inference.
Probability Theory
Random variables, stochastic processes, and events are the core concepts necessary to define likelihoods of certain outcomes or results to be observed. We define event manipulations and present the fundamental principles of probability theory including conditional probability, total and Bayesian probability laws, and various combinatorial ideas.
Odds Ratio/Relative Risk
The relative risk, RR, (a measure of dependence comparing two probabilities in terms of their ratio) and the odds ratio, OR, (the fraction of one probability and its complement) are widely applicable in many healthcare studies.
Probability Distributions
Probability distributions are mathematical models for processes that we observe in nature. Although there are different types of distributions, they have common features and properties that make them useful in various scientific applications.
Resampling and Simulation
Resampling is a technique for estimation of sample statistics (e.g., medians, percentiles) by using subsets of available data or by randomly drawing replacement data. Simulation is a computational technique addressing specific imitations of what’s happening in the real world or system over time without awaiting it to happen by chance.
Design of Experiments
Design of experiments (DOE) is a technique for systematic and rigorous problem solving that applies data collection principles to ensure the generation of valid, supportable and reproducible conclusions.
Intro to Epidemiology
Epidemiology is the study of the distribution and determinants of disease frequency in human populations. This section presents the basic epidemiology concepts. More advanced epidemiological methodologies are discussed in the next chapter.
Experiments vs. Observational Studies
Experimental and observational studies have different characteristics and are useful in complementary investigations of association and causality.
Estimation
Estimation is a method of using sample data to approximate the values of specific population parameters of interest like population mean, variability or 97th percentile. Estimated parameters are expected to be interpretable, accurate and optimal, in some form.
Hypothesis Testing
Hypothesis testing is a quantitative decision-making technique for examining the characteristics (e.g., centrality, span) of populations or processes based on observed experimental data.
Statistical Power, Sensitivity and Specificity
The fundamental concepts of type I (false-positive) and type II (false-negative) errors lead to the important study-specific notions of statistical power, sample size, effect size, sensitivity and specificity.
Data Management
All modern data-driven scientific inquiries demand deep understanding of tabular, ASCII, binary, streaming, and cloud data management, processing and interpretation.
Bias and Precision
Bias and precision are two important and complementary characteristics of estimated parameters that quantify the accuracy and variability of approximated quantities.
Association and Causality
An association is a relationship between two, or more, measured quantities that renders them statistically dependent so that the occurrence of one does affect the probability of the other. A causal relation is a specific type of association between an event (the cause) and a second event (the effect) that is considered to be a consequence of the first event.
Rate-of-change
Rate of change is a technical indicator describing the rate in which one quantity changes in relation to another quantity.
Clinical vs. Statistical Significance
Statistical significance addresses the question of whether or not the results of a statistical test meet an accepted quantitative criterion, whereas clinical significance is answers the question of whether the observed difference between two treatments (e.g., new and old therapy) found in the study large enough to alter the clinical practice.
Correction for Multiple Testing
Multiple testing refers to analytical protocols involving testing of several (typically more then two) hypotheses. Multiple testing studies require correction for type I (false-positive rate), which can be done using Bonferroni's method, Tukey’s procedure, family-wise error rate (FWER), or false discovery rate (FDR).
Chapter II: Applied Inference
Epidemiology
This section expands the Epidemiology Introduction from the previous chapter. Here we will discuss numbers needed to treat and various likelihoods related to genetic association studies, including linkage and association, LOD scores and Hardy-Weinberg equilibrium.
Correlation and Regression (ρ and slope inference, 1-2 samples)
Studies of correlations between two, or more, variables and regression modeling are important in many scientific inquiries. The simplest situation such situation is exploring the association and correlation of bivariate data ($X$ and $Y$).
ROC Curve
The receiver operating characteristic (ROC) curve is a graphical tool for investigating the performance of a binary classifier system as its discrimination threshold varies.
ANOVA
Analysis of Variance (ANOVA) is a statistical method fpr examining the differences between group means. ANOVA is a generalization of the t-test for more than 2 groups. It splits the observed variance into components attributed to different sources of variation.
Non-parametric inference
Nonparametric inference involves a class of methods for descriptive and inferential statistics that are not based on parametrized families of probability distributions, which is the basis of the parametric inference we discussed earlier.
Instrument Performance Evaluation: Cronbach's α
Cronbach’s alpha (α) is a measure of internal consistency used to estimate the reliability of a cumulative psychometric test.
Measurement Reliability and Validity
Measures of Validity include: Construct validity (extent to which the operation actually measures what the theory intends to), Content validity (the extent to which the content of the test matches the content associated with the construct), Criterion validity (the correlation between the test and a variable representative of the construct), experimental validity (validity of design of experimental research studies). Similarly, there many alternate strategies to assess instrument Reliability (or repeatability) -- test-retest reliability, administering different versions of an assessment tool to the same group of individuals, inter-rater reliability, internal consistency reliability.
Survival Analysis
Survival analysis is used for analyzing longitudinal data on the occurrence of events (e.g., death, injury, onset of illness, recovery from illness). In this section we discuss data structure, survival/hazard functions, parametric versus semi-parametric regression techniques and introduction to Kaplan-Meier methods (non-parametric).
Decision Theory
Decision theory helps determining the optimal course of action among a number of alternatives, when consequences cannot be forecasted with certainty. There are different types of loss-functions and decision principles (e.g., frequentist vs. Bayesian).
CLT/LLNs – limiting results and misconceptions
The Law of Large Numbers (LLT) and the Central Limit Theorem (CLT) are the first and second fundamental laws of probability. CLT yields that the arithmetic mean of a sufficiently large number of iterates of independent random variables given certain conditions will be approximately normally distributed. LLT states that in performing the same experiment a large number of times, the average of the results obtained should be close to the expected value and tends to get closer to the expected value with increasing number of trials.
Association Tests
There are alternative methods to measure association two quantities (e.g., relative risk, risk ratio, efficacy, prevalence ratio).
Bayesian Inference
Bayes’ rule connects the theories of conditional and compound probability and provides a way to update probability estimates for a hypothesis as additional evidence is observed.
PCA/ICA/Factor Analysis
Principal component analysis is a mathematical procedure that transforms a number of possibly correlated variables into a fewer number of uncorrelated variables through a process known as orthogonal transformation. Independent component analysis is a computational tool to separate a multivariate signal into additive subcomponents by assuming that the subcomponents are non-Gaussian signals and are statistically independent from each other. Factor analysis is a statistical method, which describes variability among observed correlated variables in terms of potentially lower number of unobserved variables.
Point/Interval Estimation (CI) – MoM, MLE
Estimation of population parameters is critical in many applications. In statistics, estimation is commonly accomplished in terms of point-estimates or interval-estimates for specific (unknown) population parameters of interest. The method of moments (MOM) and maximum likelihood estimation (MLE) techniques are used frequently in practice.
Study/Research Critiques
The scientific rigor in published literature, grant proposals and general reports needs to be assessed and scrutinized to minimize errors in data extraction and meta-analysis. Reporting biases present significant obstacles to collecting of relevant information on the effectiveness of an intervention, strength of relations between variables, or causal associations.
Common mistakes and misconceptions in using probability and statistics, identifying potential assumption violations, and avoiding them
Chapter III: Linear Modeling
Multiple Linear Regression (MLR)
Generalized Linear Modeling (GLM)
Analysis of Covariance (ANCOVA)
First, see the ANOVA section above.
Multivariate Analysis of Variance (MANOVA)
Multivariate Analysis of Covariance (MANCOVA)
Repeated measures Analysis of Variance (rANOVA)
Partial Correlation
Time Series Analysis
Fixed, Randomized and Mixed Effect Models
Hierarchical Linear Models (HLM)
Multi-Model Inference
Mixture Modeling
Surveys
Longitudinal Data
Generalized Estimating Equations (GEE) Models
Model Fitting and Model Quality (KS-test)
Chapter IV: Special Topics
Scientific Visualization
PCOR/CER methods Heterogeneity of Treatment Effects
Big-Data/Big-Science
Missing data
Genotype-Environment-Phenotype associations
Medical imaging
Data Networks
Adaptive Clinical Trials
Databases/registries
Meta-analyses
Causality/Causal Inference, SEM
Classification methods
Time-series analysis
Scientific Validation
Geographic Information Systems (GIS)
Rasch measurement model/analysis
MCMC sampling for Bayesian inference
Network Analysis
- SOCR Home page: http://www.socr.umich.edu
Translate this page: