Difference between revisions of "SMHS Epidemiology"

From SOCR
Jump to: navigation, search
(Core Epidemiological Measures)
Line 176: Line 176:
 
=== Core Epidemiological Measures ===
 
=== Core Epidemiological Measures ===
  
In addition to genetic metrics, standard epidemiological measures are vital for public health.
+
In addition to genetic metrics, standard epidemiological measures provide essential tools for assessing disease risk, evaluating interventions, and guiding public health decisions. These metrics help quantify associations between exposures and outcomes, estimate treatment effects, and inform policy. Below, we outline key measures, including definitions, formulas, interpretations, and examples. Where relevant, we include R code implementations for practical computation.
  
* Relative Risk (RR): <math>RR = \frac{I_{\text{exposed}}}{I_{\text{unexposed}}}</math>.
+
==== Absolute Risk Reduction (ARR) ====
* Odds Ratio (OR): <math>OR = \frac{ad}{bc}</math> (used in Case-Control studies).
+
* '''Definition''': The difference in event rates (incidences) between a control (or unexposed) group and a treatment (or exposed) group. ARR measures the absolute effect of an intervention or exposure on outcome risk.
* Number Needed to Treat (NNT):
+
* '''Formula''': <math>ARR = I_{\text{control}} - I_{\text{treatment}}</math>, where \(I\) represents incidence (proportion of events).
: The number of patients that must be treated to prevent one additional bad outcome.
+
* '''Interpretation''': A positive ARR indicates risk reduction (benefit); a negative ARR indicates increased risk (harm). It is straightforward but does not account for baseline risk.
: <math>NNT = \frac{1}{I_{\text{control}} - I_{\text{treatment}}}</math>.
+
* '''Example''': If the incidence of heart attacks is 10% in the control group and 7% in the treatment group, ARR = 0.10 - 0.07 = 0.03 (3% absolute reduction).
: *Note:* If the risk difference is negative (treatment increases risk), this becomes Number Needed to Harm (NNH).
+
* '''When to Use''': Prospective studies like randomized controlled trials (RCTs); useful for communicating tangible benefits to patients.
 +
* '''Limitations''': Sensitive to baseline risk; not ideal for comparing across populations with different event rates.
 +
 
 +
==== Relative Risk (RR) ====
 +
* '''Definition''': The ratio of the incidence of an outcome in the exposed group to that in the unexposed group. RR assesses how much an exposure increases or decreases the probability of an event.
 +
* '''Formula''': <math>RR = \frac{I_{\text{exposed}}}{I_{\text{unexposed}}}</math>.
 +
* '''Interpretation''': RR > 1 indicates increased risk due to exposure; RR < 1 indicates protective effect; RR = 1 indicates no association. It is multiplicative and accounts for baseline risk.
 +
* '''Example''': In a cohort study, smokers have a 20% lung cancer incidence, while non-smokers have 2%. RR = 0.20 / 0.02 = 10 (smokers are 10 times more likely to develop lung cancer).
 +
* '''When to Use''': Cohort studies or RCTs; preferred for common outcomes.
 +
* '''Limitations''': Can overestimate associations for rare events; not applicable in case-control studies.
 +
 
 +
==== Odds Ratio (OR) ====
 +
* '''Definition''': The ratio of the odds of an outcome in the exposed group to the odds in the unexposed group. OR approximates RR when the outcome is rare.
 +
* '''Formula''': From a 2x2 contingency table (a = exposed cases, b = exposed non-cases, c = unexposed cases, d = unexposed non-cases): <math>OR = \frac{ad}{bc}</math>.
 +
* '''Interpretation''': OR > 1 suggests positive association; OR < 1 suggests inverse association; OR = 1 suggests no association. It is often used in logistic regression.
 +
* '''Example''': In a case-control study of diabetes and obesity: 80 obese diabetics (a), 20 obese non-diabetics (b), 30 non-obese diabetics (c), 70 non-obese non-diabetics (d). OR = (80*70) / (20*30) = 9.33 (obesity increases odds of diabetes by over 9 times).
 +
* '''When to Use''': Case-control studies or when incidence data is unavailable; common in meta-analyses.
 +
* '''Limitations''': Not directly interpretable as risk for common outcomes; can differ from RR if events are frequent.
 +
 
 +
==== Number Needed to Treat (NNT) or Harm (NNH) ====
 +
* '''Definition''': The average number of patients who need to be treated (or exposed) to prevent (or cause) one additional outcome compared to the control. NNT is based on ARR and translates statistical effects into clinical relevance.
 +
* '''Formula''': <math>NNT = \frac{1}{|ARR|}</math> (use absolute value for magnitude; sign of ARR determines benefit vs. harm).
 +
* '''Interpretation''': Lower NNT indicates greater treatment efficacy. If ARR > 0, it's NNT (benefit); if ARR < 0, it's NNH (harm). Infinite NNT means no effect.
 +
* '''Example (Benefit)''': ARR = 0.03 (as above), NNT = 1 / 0.03 ≈ 33.3 (treat 33 patients to prevent one heart attack).
 +
* '''Example (Harm)''': If treatment increases incidence from 50% to 80%, ARR = 0.50 - 0.80 = -0.30, NNH = 1 / 0.30 ≈ 3.3 (treat 3 patients to cause one additional bad outcome).
 +
* '''When to Use''': RCTs or systematic reviews; helps in shared decision-making and cost-benefit analysis.
 +
* '''Limitations''': Assumes constant ARR; sensitive to time frame and baseline risk. Confidence intervals should be reported for real-world application.
 +
 
 +
==== Additional Considerations ====
 +
* '''Confidence Intervals (CI)''': Always compute 95% CIs for these measures to assess precision (e.g., using bootstrap methods or formulas in R packages like ''epitools'' or ''epiR'').
 +
* '''Attributable Risk (AR)''': Extends RR; AR = \(I_{\text{exposed}} - I_{\text{unexposed}}\) (absolute risk due to exposure).
 +
* '''Population Attributable Risk (PAR)''': PAR = \(I_{\text{population}} - I_{\text{unexposed}}\) (proportion of cases attributable to exposure in the population).
 +
* '''Best Practices''': Adjust for confounders using multivariable models; interpret in context (e.g., RR may seem large for rare events but have small absolute impact).
 +
* '''Software Tools''': R (with packages like ''epiR'', ''Epi'', or ''survival'' for advanced metrics like Hazard Ratios) or Python (with ''scipy'' or ''lifelines'') are commonly used.
 +
 
 +
'''R Implementation for Key Measures:'''
 +
This code snippet computes ARR, RR, OR, NNT/NNH from example data. It includes error handling and supports both benefit and harm scenarios.
  
'''R Implementation for NNT:'''
 
 
<pre>
 
<pre>
control_risk <- 0.50
+
# Install if needed: install.packages("epiR")  # But assuming it's available or use base R
treatment_risk <- 0.80 # Example where treatment is actually harmful/higher risk
+
 
RD <- control_risk - treatment_risk
+
# Example data: 2x2 table for OR/RR (cohort study assumption)
NNT <- ifelse(RD != 0, 1/abs(RD), Inf)
+
# Rows: Exposed (1) vs Unexposed (0); Columns: Cases vs Non-cases
if (RD < 0) {
+
a <- 20  # Exposed cases
  cat("Result is NNH (Harm):", round(NNT, 1))
+
b <- 80 # Exposed non-cases
} else {
+
c <- 2  # Unexposed cases
  cat("Result is NNT (Benefit):", round(NNT, 1))
+
d <- 98  # Unexposed non-cases
}
+
 
 +
# Incidences
 +
I_exposed <- a / (a + b)
 +
I_unexposed <- c / (c + d)
 +
 
 +
# Absolute Risk Reduction (assuming unexposed = control, exposed = treatment)
 +
ARR <- I_unexposed - I_exposed  # Positive if treatment reduces risk
 +
 
 +
# Relative Risk
 +
RR <- I_exposed / I_unexposed
 +
 
 +
# Odds Ratio
 +
OR <- (a * d) / (b * c)
 +
 
 +
# Number Needed to Treat/Harm
 +
NNT <- ifelse(ARR != 0, 1 / abs(ARR), Inf)
 +
type <- ifelse(ARR > 0, "NNT (Benefit)", ifelse(ARR < 0, "NNH (Harm)", "No Effect"))
 +
 
 +
# Output
 +
cat("Incidence Exposed:", round(I_exposed, 3), "\n")
 +
cat("Incidence Unexposed:", round(I_unexposed, 3), "\n")
 +
cat("ARR:", round(ARR, 3), "\n")
 +
cat("RR:", round(RR, 3), "\n")
 +
cat("OR:", round(OR, 3), "\n")
 +
cat(type, ":", ifelse(is.finite(NNT), round(NNT, 1), "Infinite"), "\n")
 +
 
 +
# Harm example (swap for treatment increasing risk)
 +
I_control <- 0.50
 +
I_treatment <- 0.80
 +
ARR_harm <- I_control - I_treatment
 +
NNT_harm <- 1 / abs(ARR_harm)
 +
cat("\nHarm Example - ARR:", round(ARR_harm, 3), "\n")
 +
cat("NNH:", round(NNT_harm, 1), "\n")
 
</pre>
 
</pre>
 
  
 
=== Applications and Software ===
 
=== Applications and Software ===

Revision as of 16:40, 24 January 2026

Scientific Methods for Health Sciences - Epidemiology

Overview

Epidemiology is the study of the distribution, determinants, and control of health and disease in populations. While early epidemiology focused on infectious agents, modern epidemiology encompasses genetic factors, environmental exposures, and their complex interactions. This section provides an in-depth discussion of these patterns, specifically identifying health-related risk factors and outcomes in terms of person, place, and time.

Motivation

By the end of this module, learners should be able to:

  • Understand Genetic Foundations: Describe basic features of the human genome, the distribution of mutations, and principles of segregation and linkage.
  • Analyze Population Dynamics: Apply quantitative genetic concepts to study the relationship between genetic variation and disease variation, including Hardy-Weinberg equilibrium.
  • Evaluate Associations: Understand prototypical gene-disease relationships, interpret Genome-Wide Association Studies (GWAS), and recognize gene-environment interactions.
  • Apply Computational Methods: Perform basic genetic association analysis using R and interpret key epidemiological measures like NNT and OR.

Theory: The Human Genome and Mutation

SMHS Epidem Fig 1.png

Chromosomal Structure

Chromosomes consist of highly condensed DNA.

  • Banding: Chromosomes can be stained to reveal banding patterns. Dark bands represent regions rich in Adenine (A) and Thymine (T), containing millions of nucleotides.
  • Karyotype: A normal human karyotype consists of 46 chromosomes: 22 pairs of autosomes and 1 pair of sex chromosomes (XX or XY).
  • Functional Elements:
*Centromeres:* Large arrays of repetitive DNA where spindle fibers attach during mitosis.
*Telomeres:* Repetitive sequences acting as a "cap" to provide stability; these shorten with each cell division in somatic cells.
*Chromatin:* Divided into Euchromatin (lightly condensed, gene-rich) and Heterochromatin (highly condensed, often repetitive).

Mutations and Abnormalities

Mutations are alterations in the DNA sequence that can occur in somatic cells or gametes.

  • Structural Abnormalities:

SMHS Epidemiology Fig 2.png

*Deletion:* Loss of genetic material (e.g., terminal deletion).
*Duplication:* Repetition of a chromosomal segment.
*Translocation:* Rearrangement of parts between non-homologous chromosomes.
*Inversion:* A segment of the chromosome is reversed.
  • Point Mutations (DNA Sequence):
*Nucleotide Substitution:* Alteration of a base sequence without changing the number of bases (e.g., Missense, Nonsense).
*Indels:* Insertions or deletions that alter the number of nucleotides, potentially causing frameshifts.
*Splice Site Variation:* Alterations in non-coding regions affecting RNA splicing.

Population Genetics

Allele and Genotype Frequencies

The Gene Pool represents all available genetic variation in a population.

  • Allele Frequency: The prevalence of a particular allele in a population.
\[\text{Allele Frequency} = \frac{\text{Number of specific alleles}}{2 \times \text{Number of people}}\].
  • Genotype Frequency: The prevalence of a particular genotype (e.g., AA, Aa, aa).

Hardy-Weinberg Equilibrium (HWE)

In a large, stable population with random mating, allele frequencies predict genotype frequencies. For a biallelic locus with alleles \(A\) (frequency \(p\)) and \(a\) (frequency \(q\)), where \(p+q=1\):

  • \(P(AA) = p^2\)
  • \(P(Aa) = 2pq\)
  • \(P(aa) = q^2\).

Testing for HWE: To determine if a population is in HWE, use a Chi-squared (\(\chi^2\)) test comparing Observed (\(O\)) vs. Expected (\(E\)) counts. \[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\] with 1 degree of freedom. If \(\chi^2 > 3.84\) (p < 0.05), the null hypothesis of HWE is rejected, suggesting evolutionary forces (selection, migration, non-random mating) are at play.

R Implementation for HWE:

# Hardy-Weinberg Equilibrium Test
# Method: Manual calculation
obs <- c(AA = 400, Aa = 500, aa = 100)  # observed counts
total <- sum(obs)
p <- (2*obs["AA"] + obs["Aa"]) / (2*total)  # allele frequency A
q <- 1 - p

# Expected counts under HWE
expected <- c(p^2, 2*p*q, q^2) * total

# Chi-square test
chi2 <- sum((obs - expected)^2 / expected)
p_val <- pchisq(chi2, df = 1, lower.tail = FALSE)
cat("Chi-squared =", round(chi2, 4), "p-value =", format.pval(p_val), "\n")


Pedigree Analysis and Inheritance

Pedigrees trace the transmission of traits through generations.

MSHS IntroEpi Fig 5 .png

Modes of Inheritance

  • Autosomal Dominant:
Individuals with the dominant allele (\(D\)) develop the disease.
Vertical transmission (every affected child has an affected parent).
Occurs in both males and females equally.
  • Autosomal Recessive:
Individuals must inherit two copies of the recessive allele (\(d\)) to be affected (\(dd\)).
Heterozygotes (\(Dd\)) are carriers.
Often appears in siblings of unaffected parents (horizontal pattern).
  • X-Linked Recessive:
Females carrying the mutation (\(X^C X^c\)) are usually unaffected carriers.
Males with the mutation (\(X^c Y\)) are affected.
No male-to-male transmission; mother-to-son transmission is characteristic.

Probability in Pedigrees

To estimate risk, we calculate the probability of the pedigree given a hypothesis: \[P(\text{pedigree}) = \prod_{i=1}^{n} P(\text{genotype}_i) \times P(\text{phenotype}_i | \text{genotype}_i)\]. This requires defining Penetrance: the probability of expressing a phenotype given a genotype. Incomplete penetrance occurs when an individual with a susceptible genotype does not exhibit the phenotype.

Linkage Analysis

Genetic linkage measures the proximity of genes on a chromosome.

  • Recombination Fraction (\(\theta\)): The probability that two loci will recombine during gamete formation.
\(\theta = 0.5\): Independent assortment (unlinked).
\(\theta < 0.5\): Linkage exists.
\(\theta = 0\): Complete linkage.

LOD Score

The Logarithm of Odds (LOD) score compares the likelihood of the data under linkage (\(\theta = \hat{\theta}\)) versus no linkage (\(\theta = 0.5\)). \[Z(\theta) = \log_{10} \frac{L(\theta=\hat{\theta})}{L(\theta=0.5)}\].

Interpretation:

LOD Score Interpretation
\(Z = -2\) 100:1 odds against linkage
\(Z = +3\) 1000:1 odds in favor of linkage (Threshold for significance)

Linkage Disequilibrium (LD) and Association

While linkage is observed in families, Linkage Disequilibrium (LD) is a population-based correlation between alleles at different loci.

Measures of LD

  • D (Disequilibrium Coefficient):

\[D_{AB} = p_{AB} - p_A p_B\].

  • D' (Normalized D):
Ranges from -1 to +1. \(|D'| = 1\) implies no evidence of recombination.
  • \(r^2\) (Correlation coefficient):

\[r^2 = \frac{D^2}{p_A(1-p_A)p_B(1-p_B)}\]

\(r^2\) is preferred for association studies as it is less sensitive to allele frequency differences. And, \(r^2\) implies perfect proxy markers.

R Implementation for LD:

# Calculating LD Measures
# Assuming p_AB, p_A, p_B are calculated from haplotype counts
D <- p_AB - p_A * p_B
r_sq <- D^2 / (p_A * (1-p_A) * p_B * (1-p_B))
cat("r-squared =", r_sq, "\n")


Genome-Wide Association Studies (GWAS)

GWAS tests for correlation between genetic markers (SNPs) and phenotypes across the entire genome in unrelated individuals.

  • Statistical Model: Typically uses logistic regression for case-control studies.

\[\ln\left(\frac{P(Y=1)}{1-P(Y=1)}\right) = \beta_0 + \beta_1 \cdot \text{SNP} + \text{Covariates}.\]

  • Manhattan & QQ Plots: Used to visualize results. Because millions of tests are performed, strict significance thresholds (e.g., \(p < 5 \times 10^{-8}\)) are required to avoid false positives.

Gene-Environment Interactions

Disease risk is often modeled as a combination of genetics (\(G\)), environment (\(E\)), and their interaction (\(G \times E\)). \[Y = \beta_0 + \beta_1 G + \beta_2 E + \beta_3 (G \times E) + \epsilon.\]

Interaction Models:

  • Synergistic: Genotype exacerbates the risk factor (or vice versa).
  • Independent: Both factors influence risk but do not interact.

SMHS Epi Figure 11.png
Model: Genotype exacerbates the effect of the risk factor

Core Epidemiological Measures

In addition to genetic metrics, standard epidemiological measures provide essential tools for assessing disease risk, evaluating interventions, and guiding public health decisions. These metrics help quantify associations between exposures and outcomes, estimate treatment effects, and inform policy. Below, we outline key measures, including definitions, formulas, interpretations, and examples. Where relevant, we include R code implementations for practical computation.

Absolute Risk Reduction (ARR)

  • Definition: The difference in event rates (incidences) between a control (or unexposed) group and a treatment (or exposed) group. ARR measures the absolute effect of an intervention or exposure on outcome risk.
  • Formula\[ARR = I_{\text{control}} - I_{\text{treatment}}\], where \(I\) represents incidence (proportion of events).
  • Interpretation: A positive ARR indicates risk reduction (benefit); a negative ARR indicates increased risk (harm). It is straightforward but does not account for baseline risk.
  • Example: If the incidence of heart attacks is 10% in the control group and 7% in the treatment group, ARR = 0.10 - 0.07 = 0.03 (3% absolute reduction).
  • When to Use: Prospective studies like randomized controlled trials (RCTs); useful for communicating tangible benefits to patients.
  • Limitations: Sensitive to baseline risk; not ideal for comparing across populations with different event rates.

Relative Risk (RR)

  • Definition: The ratio of the incidence of an outcome in the exposed group to that in the unexposed group. RR assesses how much an exposure increases or decreases the probability of an event.
  • Formula\[RR = \frac{I_{\text{exposed}}}{I_{\text{unexposed}}}\].
  • Interpretation: RR > 1 indicates increased risk due to exposure; RR < 1 indicates protective effect; RR = 1 indicates no association. It is multiplicative and accounts for baseline risk.
  • Example: In a cohort study, smokers have a 20% lung cancer incidence, while non-smokers have 2%. RR = 0.20 / 0.02 = 10 (smokers are 10 times more likely to develop lung cancer).
  • When to Use: Cohort studies or RCTs; preferred for common outcomes.
  • Limitations: Can overestimate associations for rare events; not applicable in case-control studies.

Odds Ratio (OR)

  • Definition: The ratio of the odds of an outcome in the exposed group to the odds in the unexposed group. OR approximates RR when the outcome is rare.
  • Formula: From a 2x2 contingency table (a = exposed cases, b = exposed non-cases, c = unexposed cases, d = unexposed non-cases)\[OR = \frac{ad}{bc}\].
  • Interpretation: OR > 1 suggests positive association; OR < 1 suggests inverse association; OR = 1 suggests no association. It is often used in logistic regression.
  • Example: In a case-control study of diabetes and obesity: 80 obese diabetics (a), 20 obese non-diabetics (b), 30 non-obese diabetics (c), 70 non-obese non-diabetics (d). OR = (80*70) / (20*30) = 9.33 (obesity increases odds of diabetes by over 9 times).
  • When to Use: Case-control studies or when incidence data is unavailable; common in meta-analyses.
  • Limitations: Not directly interpretable as risk for common outcomes; can differ from RR if events are frequent.

Number Needed to Treat (NNT) or Harm (NNH)

  • Definition: The average number of patients who need to be treated (or exposed) to prevent (or cause) one additional outcome compared to the control. NNT is based on ARR and translates statistical effects into clinical relevance.
  • Formula\[NNT = \frac{1}{|ARR|}\] (use absolute value for magnitude; sign of ARR determines benefit vs. harm).
  • Interpretation: Lower NNT indicates greater treatment efficacy. If ARR > 0, it's NNT (benefit); if ARR < 0, it's NNH (harm). Infinite NNT means no effect.
  • Example (Benefit): ARR = 0.03 (as above), NNT = 1 / 0.03 ≈ 33.3 (treat 33 patients to prevent one heart attack).
  • Example (Harm): If treatment increases incidence from 50% to 80%, ARR = 0.50 - 0.80 = -0.30, NNH = 1 / 0.30 ≈ 3.3 (treat 3 patients to cause one additional bad outcome).
  • When to Use: RCTs or systematic reviews; helps in shared decision-making and cost-benefit analysis.
  • Limitations: Assumes constant ARR; sensitive to time frame and baseline risk. Confidence intervals should be reported for real-world application.

Additional Considerations

  • Confidence Intervals (CI): Always compute 95% CIs for these measures to assess precision (e.g., using bootstrap methods or formulas in R packages like epitools or epiR).
  • Attributable Risk (AR): Extends RR; AR = \(I_{\text{exposed}} - I_{\text{unexposed}}\) (absolute risk due to exposure).
  • Population Attributable Risk (PAR): PAR = \(I_{\text{population}} - I_{\text{unexposed}}\) (proportion of cases attributable to exposure in the population).
  • Best Practices: Adjust for confounders using multivariable models; interpret in context (e.g., RR may seem large for rare events but have small absolute impact).
  • Software Tools: R (with packages like epiR, Epi, or survival for advanced metrics like Hazard Ratios) or Python (with scipy or lifelines) are commonly used.

R Implementation for Key Measures: This code snippet computes ARR, RR, OR, NNT/NNH from example data. It includes error handling and supports both benefit and harm scenarios.

# Install if needed: install.packages("epiR")  # But assuming it's available or use base R

# Example data: 2x2 table for OR/RR (cohort study assumption)
# Rows: Exposed (1) vs Unexposed (0); Columns: Cases vs Non-cases
a <- 20  # Exposed cases
b <- 80  # Exposed non-cases
c <- 2   # Unexposed cases
d <- 98  # Unexposed non-cases

# Incidences
I_exposed <- a / (a + b)
I_unexposed <- c / (c + d)

# Absolute Risk Reduction (assuming unexposed = control, exposed = treatment)
ARR <- I_unexposed - I_exposed  # Positive if treatment reduces risk

# Relative Risk
RR <- I_exposed / I_unexposed

# Odds Ratio
OR <- (a * d) / (b * c)

# Number Needed to Treat/Harm
NNT <- ifelse(ARR != 0, 1 / abs(ARR), Inf)
type <- ifelse(ARR > 0, "NNT (Benefit)", ifelse(ARR < 0, "NNH (Harm)", "No Effect"))

# Output
cat("Incidence Exposed:", round(I_exposed, 3), "\n")
cat("Incidence Unexposed:", round(I_unexposed, 3), "\n")
cat("ARR:", round(ARR, 3), "\n")
cat("RR:", round(RR, 3), "\n")
cat("OR:", round(OR, 3), "\n")
cat(type, ":", ifelse(is.finite(NNT), round(NNT, 1), "Infinite"), "\n")

# Harm example (swap for treatment increasing risk)
I_control <- 0.50
I_treatment <- 0.80
ARR_harm <- I_control - I_treatment
NNT_harm <- 1 / abs(ARR_harm)
cat("\nHarm Example - ARR:", round(ARR_harm, 3), "\n")
cat("NNH:", round(NNT_harm, 1), "\n")

Applications and Software

Modern epidemiology relies heavily on computational tools.

  • Key R Packages: `epiR` (Epi measures), `genetics` (HWE, LD), `survival` (time-to-event), `qqman` (GWAS visualization).
  • Online Tools: SOCR Distribution Tables.

Problems

Problem 1: Linkage Mapping

Scenario: Analyze the pedigree below under a Dominant Inheritance model. We need to estimate the recombination fraction \(\theta\).

SMHS Epi Figure 8.png

1. Calculate LOD Scores: Using the Maximum Likelihood Estimation (MLE), if the phase is unknown, we average likelihoods. If \(\theta=0.1\), and calculating for a specific phase arrangement: \[L(\theta) = (1-\theta)^4 \theta\] (based on 4 non-recombinants, 1 recombinant). \[Z(\theta) = \log_{10}\frac{L(\theta)}{L(0.5)}\] 2. Result Table:

\(\theta\) \(Z(\theta)\)
0.0 \(-\infty\)
0.10 0.022
0.20 0.124 (Max LOD)
0.50 0.0

The maximum LOD score occurs at \(\hat{\theta} = 0.20\).

Problem 2: NNT Calculation

Scenario: A trial shows 800/1000 events in Treatment Group A and 600/1200 events in Control Group B. 1. \(p_A = 0.80\), \(p_B = 0.50\). 2. \(NNT = \frac{1}{p_B - p_A} = \frac{1}{0.5 - 0.8} = -3.33\). Interpretation: Since the value is negative, this represents a Number Needed to Harm (NNH) of 3.3. For every ~3-4 patients treated, one additional adverse event occurs compared to the control.

Problem 3: GWAS Power Analysis (R)

Scenario: Simulate a study with 500 cases/500 controls, Minor Allele Frequency (MAF) = 0.2, OR = 1.5.

simulate_gwas_power <- function(n_cases, n_controls, maf, OR, alpha = 0.05, n_sims = 100) {
  significant <- numeric(n_sims)
  n_total <- n_cases + n_controls
  
  for (i in 1:n_sims) {
    geno <- rbinom(n_total, 2, maf) # Generate Genotypes
    beta <- log(OR)
    
    # Logistic model simulation
    log_odds <- -2 + beta * (geno - mean(geno))
    prob <- plogis(log_odds)
    status <- rbinom(n_total, 1, prob) 
    
    # Test
    model <- glm(status ~ geno, family = binomial)
    p_val <- summary(model)$coefficients[2, 4]
    significant[i] <- as.numeric(p_val < alpha)
  }
  return(mean(significant))
}


References

  1. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Lippincott, 2008.
  2. Clayton D, Hills M. Statistical Models in Epidemiology. Oxford, 2013.
  3. Ziegler A, König IR. A Statistical Approach to Genetic Epidemiology. Wiley, 2010.





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif