SMHS IntroEpi
Contents
- 1 Scientific Methods for Health Sciences - Introduction to Epidemiology
- 2 Five main goals of epidemiology
- 3 Distinguishing between Endemic, Epidemic, and Pandemic
- 4 Modes of Disease Transmission
- 5 Attack Rates and Ratios (ARR)
- 6 Measuring Disease
- 7 Measuring Mortality Rates
- 8 Additional Measures of Mortality
- 9 Direct and Indirect Adjustment of Rates
- 10 Screening
- 11 Randomized Controlled Trials (RCT):
- 12 Cohort Study:
- 13 Case Control Study:
- 14 Cross-Sectional Studies:
- 15 Ecologic Studies:
- 16 Other Risk Estimates:
- 17 Bias: A barrier to internal validity
Scientific Methods for Health Sciences - Introduction to Epidemiology
Overview
Epidemiology is the study of the distribution and determinants of disease frequency in human populations. It serves as an important area in the scientific field: it is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. The introduction to Epidemiology aims to introduce the filed of Epidemiology and study the basic concepts and methodologies we are going to apply later. It also aims to help students solve and analyze Epidemiological problems and introduce students to various Epidemiological studies.
Motivation
To get an introduction to Epidemiology, we want to:
- study on the basis of the language of epidemiology and identify key sources of data for epidemiologic purposes
- be able to calculate and interpret measures of disease frequency
- recognize and evaluate epidemiological study designs and their limitations
- be an informed consumer of epidemiological sources of information (journals, websites, government agencies).
Theory
Five main goals of epidemiology
- To identify the cause of disease and its risk factors
- To determine the extent of disease found in the community
- To study the natural history and prognosis of disease
- To evaluate new preventative and therapeutic measures
- To provide a foundation for developing public policy.
Distinguishing between Endemic, Epidemic, and Pandemic
- Endemic: The habitual presence (or usual occurrence) of a disease within a given geographic area;
- Epidemic: The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;
- Pandemic: A worldwide epidemic affecting an exceptionally high proportion of the global population.
Modes of Disease Transmission
- Direct contact: transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse.
- Indirect contact: transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector. (1) Inanimate object vehicle), examples may be toy, food or water; (2) Vector-borne (animal or insect), examples include mosquito, tick and mice.
Attack Rates and Ratios (ARR)
Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak involves: starting with the big picture and big risk factors for disease such as “How many people at the event got ill?”; refining the big picture into smaller questions of “Did they eat the salad? Chicken? Or ice cream?”; formulating a hypothesis such as “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”
- Attack Rates (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}.$
- Attack Rate Ratio (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}.$
- $H_{0}:ARR=1$,and 95% confidence intervals can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those unexposed.
Measuring Disease
To name and calculate two measures of incidence and describe differences in interpreting these measures as well as to understand the difference of the difference between proportion and a true rate.
- Incidence: number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.
- Cumulative incidence: $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk}. $
- Incidence rate: $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}.$
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days = 3 + 5 + 8 = 16.
- Prevalence $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}.$
- The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.
Measuring Mortality Rates
- To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates.
- All cause mortality rates=$\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$.
- Cause-specific mortality rate=$\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$.
- Group-specific mortality rate=$\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$.
Additional Measures of Mortality
- Infant mortality: $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$.
- Proportionate mortality: measures proportion of all deaths occurring in a given place over a given time that is due to a given cause.
- Case fatality: Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period.
- Underlying cause of death.
Direct and Indirect Adjustment of Rates
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjust for age to compare the mortality rates in two populations if they both have the same age distribution.
- Direct age-adjustment: expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.
For each population:
- 1. Calculate age-specific rates
- 2. Multiply age-specific rates by the # of people in corresponding age range in standard population
- 3. Sum expected # of deaths across age groups
- 4. Divide total # of expected deaths by total standard population
Age-adjusted mortality rate for each population of interest.
- Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the standardized mortality rate (SMR). It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).
- 1. Acquire age-specific mortality rates for standard population
- 2. Multiply standard population’s age-specific rates by # of people in age range in study population
- 3. Sum expected # of deaths across age groups in study population
- 4. Divide observed # of deaths by expected # of deaths in study population
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)
Screening
Screening is the use of testing to sort out apparently well persons (asymptomatic) who probably have disease from those who probably do not and allows to detect the disease early. Examples of screening include: fasting blood sugar for diabetes, bone densitometry for osteoporosis and Otoacoustic emissions testing for hearing loss new borns. It is done during the preclinical phase and is a secondary prevention strategy. Screening increases lead time, thereby allows us to detect disease early, initiate treatment sooner and provide better outcomes. However, it is critical that screening programs must be warranted and there must be a critical point that can be preceded by screening.
A. Clinical utility predictive value & reliability: clinical utility of positive tests. If a patient is tested positive, the likelihood they actually have the disease is called Positive Predictive Value (PPV), if a patient tests negative, the likelihood they actually do not have the disease is called Negative Predictive Value (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.
Disease Status | |||
Disease | No Disease | ||
Screening Test | Positive | a (True positives) | b (False positives) |
Negative | c (False negatives) | d (True negatives) |
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$
PPV interpretation: Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.
NPV interpretation: Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.
B. Factors influence predictive values:
- Disease prevalence: increasing disease prevalence increases PPV (or decreases NPV). Screening program most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.
- Test specificity (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): higher test specificity increases PPV.
- Test sensitivity (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c}).$
Note: the cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.
C. Validity: validity is the ability of a test to distinguish between who has disease and who does not; reliability is the ability to replicate results on same sample if test if repeated. The following charts shows the three possible outcomes: (from left to right) valid not reliable, reliable not valid and valid and reliable.
D. Reliability(repeatability) of tests:
Can the results be replicated if the test is redone? The results may be influenced by three factors:
- Intrasubject variation: variation within individual subjects
- Intraobserver variation: variation in reading of results by the same reader
- Interobserver variation: variation between those reading results
E. How do multiple testing improve screening programs?
Using multiple tests:
- (1) sequential tests(2-stage) is less expensive, less invasive, less uncomfortable test first; if positive on first test, then follow-up with additional testing.
- (2) simultaneous tests (parallel) conduct multiple screening tests at the same time; to be considered positive, the person can test positive on either test, to be considered negative, the person must test negative on all tests.
Each test has own sensitivity and specificity. Utilization of multiple testing can improve net sensitivity (simultaneous testing) or net specificity (sequential testing), that is sequential testing decreases net sensitivity and increases net specificity while simultaneous testing increases net sensitivity and decreases net specificity.
Randomized Controlled Trials (RCT):
The investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatment” is that the compared groups are the same except for the “treatment”.
*Steps of a RCT: hypothesis formed; study participant recruited based on specific criteria and their informed consent is sought; eligible and willing participants randomly allocated to receive assignment to a particular study group; study groups are monitored for outcome under study; rates of outcome in the various groups are compared.
External and internal validity:
- External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.
- Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.
Measures of Association and Effect in RCT:
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio. Difference between two measures of disease incidence: Risk difference, efficacy.
Disease Status | |||
Disease | No Disease | ||
Treatment | Drug A | a | b |
Placebo | c | d |
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_placebo}$
$Rate Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.
- Situations that favor the use of RCT:
- (1) Exposure of interest is a modifiable factor over which individuals are willing to relinquish control;
- (2) Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;
- (3) Effect of intervention on outcome is of sufficient importance to justify a large study.
Cohort Study:
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group.
- Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.
- Types:
Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected. Retrospective has benefits: more cost effective; good for disease of long latency. Prospective has benefits: data quality presumably higher. Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.
- Measures of Association in Cohort Study:
Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio. Difference between two measures of disease incidence: Risk Difference, Rate Difference.
- Strengths and weakness of Cohort Design:
Strengths:
- (1) Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information.
- (2) Excellent for studying known adverse exposures or those that cannot practically be randomized.
- (3) Like RCT, excellent for studying rare exposures. (4) Multiple outcomes and sometimes multiple exposures can be studied.
Disadvantages:
- (1) Long-term follow-up required and expensive;
- (2) Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop;
- (3) Loss to follow-up can be a problem;
- (4) Changes over time in criteria and methods can lead to problems with inferences;
- (5) People self-select exposures so exposed and unexposed may differ with respect to important characteristics.
- Situations favor a Cohort Study:
- (1) When there is evidence of an association between the exposure and the disease from other studies;
- (2) When the exposure is rare but incidence of disease among the exposure is high;|
- (3) When time between exposure and development of the disease is relatively short or historical data is available;
- (4) When good follow-up can be ensured.
Case Control Study:
A case control study compares cases and controls to see which group has greater exposure to the disease.
- Measures of Association: Odds Ratio.
Case | Control | ||
Exposed | Yes | a | b |
No | c | d |
$Odds Ratio=\frac{odds of a case being exposed}{odds of a control being exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$
Interpretation: Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds is the same in cases and controls (if OR = 1)).
- Strengths and weakness of Case Control Study:
- Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.
- Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.
- Avoiding Recall / Reporting Bias: adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information; using existing information if/when possible (e.g. medical record); mask participants to study hypothesis.
- Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:
- (1) when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn;
- (2) when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn;
- (3) when the disease being studied does not occur frequently.
Cross-Sectional Studies:
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).
*Strengths and Limitations of Cross-Sectional Studies:
Strengths:
- (1) good for generating hypotheses;
- (2) easily sets up other analytic designs;
- (3) temporality is not a problem for time invariant exposures (genetic markers);
- (4) relatively low cost.
Weakness:
- (1) temporality – exposure or disease which happened first;
- (2) prevalent cases may not be the same as incident cases;
- (3) not useful for rare disease;
- (4) subject to selection bias.
- Measures of Association in Cross Sectional Studies
Case | Control | ||
Exposed | Yes | a | b |
No | c | d |
$Prevalence Ratio=\frac{Prevalence\, of\, disease\, in\, exposed}{Prevalence\, of\, disease\, in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$
Ecologic Studies:
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level.
- Strengths and Disadvantages of Ecologic Studies:
Strengths:
- (1) data is relatively easy and/or cheap to obtain;
- (2) good place to start; (3) many relevant social, occupational and environmental exposures cannot be ascribed to an individual.
Weakness: reliance on group-level data may not correctly represent individual-level associations.
- Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.
- Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.
Other Risk Estimates:
Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.
- Attributable Risk (AR):$AR=CI_{Exposed} - CI _{Not exposed}$ This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure.
- Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$
- Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$
- Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$
Bias: A barrier to internal validity
- Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results.
- Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).
- Reasons why we get wrong answer:
- (1) Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.
- Mechanisms to reduce bias:
- Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).
- Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.
- (2) Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.
- (3) Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder.
- (4) Chance: the luck of draw gets you a study sample that is not representative of the larger population.
- Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment.
Matching in a case-control study:
Control Exposed | Control Unexposed | |
Case Exposed | a | b |
Case Unexposed | c | d |
Concordant pairs: both case and control exposed; neither case nor control exposed. Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.
- Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$
Interpretation: If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.
- Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.
- Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant.
- Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure.
Example following is age-adjustment.
Applications
[This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.
[This article],studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.
Software
Problems
How do we learn about existence of outbreaks?
- a. cases call health departments directly
- b. clinicians
- c. laboratories
- d. all of the above
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?
- a. host
- b. agent
- c. vector
- d. environment
- e. all of the above
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?
- a. 0.002 lung cancer cases per 100,000 person years
- b. 200 lung cancer cases per 100,000 person years
- c. 270 lung cancer cases per 100,000 person years
- d. 243 lung cancer cases per 100,000 person years
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?
- a. The prevalence increases if the duration of disease is increasing or stays the same
- b. The prevalence increases if the duration of disease is decreasing rapidly
- c. The prevalence decreases if the duration of disease is increasing
- d. The prevalence decreases if the duration of disease stays the same
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.
Age groups (years) | Age-specific rates (per 100,000) | Michigan standard population | Expected number of deaths |
<20 | 20 | 2,000,000 | |
20-39 | 10 | 3,000,000 | |
40-59 | 5 | 1,000,000 | |
>60 | 30 | 4,000,000 | |
Total | 10,000,000 |
- SOCR Home page: http://www.socr.umich.edu
Translate this page: