Difference between revisions of "SMHS IntroEpi"
(→Bias) |
|||
(75 intermediate revisions by 5 users not shown) | |||
Line 2: | Line 2: | ||
===Overview=== | ===Overview=== | ||
− | Epidemiology is the study of the distribution and determinants of disease frequency in human populations. It | + | [http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies. |
===Motivation=== | ===Motivation=== | ||
− | + | In this introduction to epidemiology, we will: | |
− | * | + | *Study the language of epidemiology and identify key sources of data for epidemiological purposes |
− | * | + | *Be able to calculate and interpret measures of disease frequency |
− | * | + | *Recognize and evaluate epidemiological study designs and their limitations |
− | * | + | *Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies). |
===Theory=== | ===Theory=== | ||
− | + | *Five main goals of epidemiology: | |
− | + | # To identify the cause of disease and its risk factors | |
− | + | # To determine the extent of disease found in the community | |
− | + | # To study the natural history and prognosis of disease | |
− | + | # To evaluate new preventative and therapeutic measures | |
− | + | # To provide a foundation for developing public policy | |
− | + | *Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'': | |
− | + | #''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area; | |
− | + | #''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area; | |
− | + | #''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population. | |
− | + | *Modes of Disease Transmission | |
− | + | #''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse | |
− | + | #''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector | |
+ | ##''Inanimate (object or vehicle)'': Examples may be toy, food or water | ||
+ | ##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice | ||
− | + | *Attack Rates and Ratios (ARR) | |
− | Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak involves: | + | :Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: |
+ | #Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”) | ||
+ | #Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”) | ||
+ | #Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”) | ||
− | + | :''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ | |
+ | :''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$ | ||
+ | *$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed. | ||
− | + | ====Measuring Disease==== | |
+ | To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate. | ||
− | * | + | *''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%. |
+ | *''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $ | ||
− | + | *''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ | |
− | + | Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$. | |
− | * | + | *''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ |
− | * | + | *The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013. |
− | + | ====Measuring Mortality Rates==== | |
− | + | To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates: | |
− | + | *All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$ | |
+ | *Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$ | ||
− | * | + | *Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$ |
− | |||
− | * | + | ====Additional Measures of Mortality==== |
+ | *''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$ | ||
− | + | *''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause | |
− | * | + | *''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period |
+ | *''Underlying cause of death'' | ||
− | + | ====Direct and Indirect Adjustment of Rates==== | |
+ | Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution. | ||
− | + | *''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | *Direct age-adjustment: | ||
For each population: | For each population: | ||
− | |||
− | |||
− | |||
− | |||
− | + | # Calculate age-specific rates | |
+ | # Multiply age-specific rates by the # of people in corresponding age range in standard population | ||
+ | # Sum expected # of deaths across age groups | ||
+ | # Divide total # of expected deaths by total standard population | ||
− | *Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the standardized mortality rate (SMR). It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small). | + | ====Age-adjusted mortality rate for each population of interest==== |
+ | *Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small). | ||
+ | # Acquire age-specific mortality rates for standard population | ||
+ | # Multiply standard population’s age-specific rates by # of people in age range in study population | ||
+ | # Sum expected # of deaths across age groups in study population | ||
+ | # Divide observed # of deaths by expected # of deaths in study population | ||
− | :1 | + | Result: SMR (>1 more than expected, =1 as expected, <1 less than expected) |
− | |||
− | |||
− | |||
− | + | ====Screening==== | |
+ | ''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: | ||
− | + | *Fasting blood sugar for diabetes | |
+ | *Bone densitometry for osteoporosis | ||
+ | *Otoacoustic emissions testing for hearing loss in newborns | ||
− | Screening | + | Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. |
+ | =====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests===== | ||
− | + | If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test. | |
− | If a patient is tested positive, the likelihood they actually have the disease is called '''Positive Predictive Value | ||
+ | <center> | ||
{|class="wikitable" style="text align:center;width:25%"border="1" | {|class="wikitable" style="text align:center;width:25%"border="1" | ||
|- | |- | ||
Line 123: | Line 121: | ||
| Negative || c (False negatives)|| d (True negatives) | | Negative || c (False negatives)|| d (True negatives) | ||
|} | |} | ||
− | |||
− | |||
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$ | $PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$ | ||
</center> | </center> | ||
+ | '''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV. | ||
+ | |||
+ | '''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV. | ||
+ | |||
+ | * [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]]. | ||
− | + | ===== Factors Influence Predictive Values===== | |
− | + | ''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence. | |
+ | *''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV. | ||
+ | *''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$ | ||
− | ''' | + | '''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity. |
− | |||
− | |||
− | |||
− | + | =====Validity===== | |
− | '' | + | ''Validity'': The ability of a test to distinguish between who has disease and who does not |
+ | ''Reliability'': The ability to replicate results on same sample if test if repeated | ||
+ | |||
+ | The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''. | ||
<center> | <center> | ||
Line 148: | Line 151: | ||
</center> | </center> | ||
− | + | =====Reliability (repeatability) of tests===== | |
− | |||
Can the results be replicated if the test is redone? The results may be influenced by three factors: | Can the results be replicated if the test is redone? The results may be influenced by three factors: | ||
− | |||
− | |||
− | |||
+ | *''Intrasubject variation'': Variation within individual subjects | ||
+ | *''Intraobserver variation'': Variation in reading of results by the same reader | ||
+ | *''Interobserver variation'': Variation between those reading results | ||
− | + | =====How do multiple testing improve screening programs?===== | |
+ | Using multiple tests: | ||
− | + | # ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing. | |
− | + | # ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. | |
− | |||
− | + | Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words: | |
− | + | *Sequential testing decreases net sensitivity and increases net specificity | |
+ | *Simultaneous testing increases net sensitivity and decreases net specificity | ||
+ | ===Randomized Controlled Trials (RCT)=== | ||
+ | In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment." | ||
− | == | + | ====Steps of a RCT==== |
+ | RCTs involve the following sequential steps: | ||
− | + | #Hypothesis formulation | |
− | + | #Study participant recruitment based on specific criteria | |
− | + | #Gathering informed consent | |
+ | #Allocation of eligible and willing participants into random assignment study groups | ||
+ | #Monitoring study groups for outcome under study | ||
+ | #Comparing rates of different outcomes in various groups | ||
<center> | <center> | ||
− | [[Image:MSHS_IntroEpi_Fig_3_actually2.png]] | + | [[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]] |
</center> | </center> | ||
+ | ====External and internal validity==== | ||
− | + | *''External validity'': Generalization of study to larger source population, which is influenced by factors like: | |
− | + | :*Demographic differences between eligible and ineligible subgroups | |
− | + | :*Intervention mirror what will happen in the community or source population | |
− | *Internal validity: Ability to reach correct conclusion in study | + | *''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: |
− | + | :*Ability of subjects to provide valid and reliable data | |
+ | :*Expected compliance with a regimen | ||
+ | :*Low probability of dropping out | ||
− | + | ====Measures of Association and Effect in RCT==== | |
− | + | ||
− | Ratio of two measures of disease incidence (relative measures) | + | Ratio of two measures of disease incidence (relative measures): |
− | Difference between two measures of disease incidence: Risk difference | + | |
+ | *Risk Ratio (Relative Risk) | ||
+ | *Rate Ratio | ||
+ | |||
+ | Difference between two measures of disease incidence: | ||
+ | |||
+ | *Risk difference | ||
+ | *Efficacy | ||
<center> | <center> | ||
Line 204: | Line 223: | ||
|} | |} | ||
</center> | </center> | ||
+ | $Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$ | ||
+ | <center> | ||
+ | $Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$ | ||
+ | </center> | ||
− | + | '''Interpretation''': | |
− | $ | + | *$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B |
+ | *$RR=1$, Null value (no difference between groups) | ||
+ | *$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk | ||
+ | <center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$ | ||
+ | </center> | ||
− | + | *Situations that favor the use of RCT: | |
+ | # Exposure of interest is a modifiable factor over which individuals are willing to relinquish control. | ||
+ | # Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks. | ||
+ | # Effect of intervention on outcome is of sufficient importance to justify a large study. | ||
+ | ===Cohort Study=== | ||
+ | Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. | ||
+ | *Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups. | ||
+ | [[Image:MSHS_IntroEpi_Fig2_C.png |500px|]] | ||
+ | *Types: | ||
+ | **Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected | ||
+ | **Retrospective has benefits: more cost effective; good for disease of long latency | ||
+ | **Prospective has benefits: data quality presumably higher | ||
− | + | Both designs need to be cautious of ascertainment biases if outcomes or exposure is known. | |
+ | *Measures of Association in Cohort Study: | ||
+ | **Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio | ||
+ | **Difference between two measures of disease incidence: Risk Difference, Rate Difference | ||
− | * | + | *Strengths and weakness of Cohort Design: |
+ | : Strengths: | ||
+ | # Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information | ||
+ | # Excellent for studying known adverse exposures or those that cannot practically be randomized | ||
+ | # Like RCT, excellent for studying rare exposures | ||
+ | # Multiple outcomes and sometimes multiple exposures can be studied | ||
+ | : Disadvantages: | ||
+ | # Long-term follow-up required and expensive | ||
+ | # Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop | ||
+ | # Loss to follow-up can be a problem | ||
+ | # Changes over time in criteria and methods can lead to problems with inferences | ||
+ | # People self-select exposures so exposed and unexposed may differ with respect to important characteristics | ||
− | + | *Situations favor a Cohort Study: | |
+ | # When there is evidence of an association between the exposure and the disease from other studies | ||
+ | # When the exposure is rare but incidence of disease among the exposure is high | ||
+ | # When time between exposure and development of the disease is relatively short or historical data is available | ||
+ | # When good follow-up can be ensured | ||
− | + | ===Case Control Study=== | |
+ | A case control study compares cases and controls to see which group has greater exposure to the disease. | ||
+ | *Measures of Association: Odds Ratio | ||
+ | <center> | ||
+ | {|class="wikitable" style="text align:center;width:25%"border="1" | ||
+ | |- | ||
+ | | colspan=2| || Case || Control | ||
+ | |- | ||
+ | |rowspan=2 |Exposed || Yes || a || b | ||
+ | |- | ||
+ | | No || c ||d | ||
+ | |- | ||
+ | |} | ||
+ | </center> | ||
+ | $Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$ | ||
− | ( | + | ====Interpretation==== |
+ | Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)). | ||
+ | *Strengths and weakness of Case Control Study: | ||
+ | **Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”. | ||
+ | **Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population. | ||
− | + | *Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: | |
− | + | # Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information | |
− | * | + | # Using existing information if/when possible (e.g. medical record) |
+ | # Masking participants to study hypothesis | ||
+ | *Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR: | ||
+ | # When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn | ||
+ | # When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn | ||
+ | # When the disease being studied does not occur frequently | ||
− | + | ===Cross-Sectional Studies=== | |
+ | A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality). | ||
− | + | ====Strengths and Limitations of Cross-Sectional Studies==== | |
− | + | * '''Strengths:''' | |
− | + | # Good for generating hypotheses | |
− | + | # Easily sets up other analytic designs | |
− | + | # Temporality is not a problem for time invariant exposures (genetic markers) | |
+ | # Relatively low cost | ||
− | * | + | *'''Weakness:''' |
+ | # Temporality – exposure or disease which happened first | ||
+ | # Prevalent cases may not be the same as incident cases | ||
+ | # Not useful for rare disease | ||
+ | # Subject to selection bias | ||
− | Ratio of | + | ====Measures of Association in Cross Sectional Studies==== |
− | + | <center> | |
+ | {|class="wikitable" style="text align:center;width:25%"border="1" | ||
+ | |- | ||
+ | | colspan=2| || Case || Control | ||
+ | |- | ||
+ | |rowspan=2 |Exposed || Yes || a || b | ||
+ | |- | ||
+ | | No || c ||d | ||
+ | |- | ||
+ | |} | ||
+ | $Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$ | ||
+ | </center> | ||
− | + | ===Ecologic Studies=== | |
− | + | An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | ( | ||
− | |||
+ | ====Strengths and Disadvantages of Ecologic Studies==== | ||
+ | *'''Strengths:''' | ||
+ | # Data is relatively easy and/or cheap to obtain. | ||
+ | # Ecological studies are a good place to start. | ||
+ | # Many relevant social, occupational and environmental exposures cannot be ascribed to an individual. | ||
+ | *'''Weaknesses:''' | ||
+ | #Reliance on group-level data may not correctly represent individual-level associations. | ||
+ | #Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist. | ||
+ | #Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct. | ||
+ | ===Other Risk Estimates=== | ||
+ | *''Attributable Risk Estimates of Effect'': If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure. | ||
+ | *''Attributable Risk'' ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. | ||
+ | *''Attributable Risk Percent'' $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$ | ||
+ | *''Population Attributable Risk'' ($PAR$): $PAR= CI_{Total} - CI_{Not exposed}$ | ||
+ | *''Population Attributable Risk Percent'' $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$. | ||
+ | ===Bias=== | ||
+ | Bias is a barrier to internal validity. | ||
+ | *''Causes of bias'': Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. | ||
+ | *''Impact of bias'': Makes it appear as if there is an association when there really is none (bias away form the null); masks an association when there really is one (bias toward the null). | ||
+ | *''Reasons we get wrong answers'': Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study. | ||
+ | *Mechanisms to reduce bias: | ||
+ | **Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates). | ||
+ | **Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics. | ||
+ | *''Information bias'': The quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator. | ||
+ | *''Confounding bias'': Differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. | ||
+ | *''Chance'': The luck of draw gets you a study sample that is not representative of the larger population. | ||
+ | *''Strategies to handle confounding'': (1) In study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: | ||
+ | <center> | ||
+ | {|class="wikitable" style="text align:center;width:25%"border="1" | ||
+ | |- | ||
+ | | || Control Exposed || Control Unexposed | ||
+ | |- | ||
+ | | Case Exposed || a || b | ||
+ | |- | ||
+ | |Case Unexposed || c ||d | ||
+ | |- | ||
+ | |} | ||
+ | </center> | ||
+ | *''Concordant pairs'': Both case and control exposed; neither case nor control exposed. | ||
+ | *''Discordant pairs'': Case exposed but control not exposed; control exposed but case not exposed. | ||
+ | *''Matched analysis'': Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}$ | ||
+ | ''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome. | ||
+ | *''Randomization'': Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups. | ||
+ | *''Stratification'': Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. | ||
+ | *''Adjustment'': A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. | ||
+ | An example of age-adjustment: | ||
+ | [[Image:MSHS_IntroEpi_Fig4.png]] | ||
+ | ===Applications=== | ||
+ | * [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research. | ||
+ | * [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance. | ||
+ | ===Software=== | ||
+ | *[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator] | ||
+ | *[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables] | ||
+ | ===Problems=== | ||
+ | How do we learn about existence of outbreaks? | ||
+ | :a. Cases call health departments directly | ||
+ | :b. Clinicians | ||
+ | :c. Laboratories | ||
+ | :d. All of the above | ||
+ | In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad? | ||
+ | :a. Host | ||
+ | :b. Agent | ||
+ | :c. Vector | ||
+ | :d. Environment | ||
+ | :e. All of the above | ||
+ | The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)? | ||
+ | :a. 0.002 lung cancer cases per 100,000 person years | ||
+ | :b. 200 lung cancer cases per 100,000 person years | ||
+ | :c. 270 lung cancer cases per 100,000 person years | ||
+ | :d. 243 lung cancer cases per 100,000 person years | ||
+ | In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below? | ||
+ | :a. The prevalence increases if the duration of disease is increasing or stays the same. | ||
+ | :b. The prevalence increases if the duration of disease is decreasing rapidly. | ||
+ | :c. The prevalence decreases if the duration of disease is increasing. | ||
+ | :d. The prevalence decreases if the duration of disease stays the same. | ||
+ | Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012. | ||
+ | <center> | ||
+ | {| class="wikitable" style="text-align:center:width:25% border="1" | ||
+ | |- | ||
+ | |Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths | ||
+ | |- | ||
+ | |<20|| 20 ||2,000,000|| | ||
+ | |- | ||
+ | |20-39|| 10 || 3,000,000 || | ||
+ | |- | ||
+ | |40-59 ||5 ||1,000,000|| | ||
+ | |- | ||
+ | |>60|| 30|| 4,000,000|| | ||
+ | |- | ||
+ | |Total || || 10,000,000 || | ||
+ | |} | ||
+ | </center> | ||
+ | What is the age-adjusted mortality rate from diabetes among whites according to the table above? | ||
+ | :a. 40.2 deaths per 100,000 | ||
+ | :b. 19.5 deaths per 100,000 | ||
+ | :c. 1.9 death per 100,000 | ||
+ | :d. 20.4 deaths per 100,000 | ||
+ | Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000? | ||
+ | :a. 1.54 | ||
+ | :b. 5.02 | ||
+ | :c. 1.69 | ||
+ | :d. 0.65 | ||
+ | When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity. | ||
+ | :a. True | ||
+ | :b. False | ||
+ | Sequential testing tends to have higher net specificity than specificity of a single test. | ||
+ | :a. True | ||
+ | :b. False | ||
+ | A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions: | ||
+ | <center> | ||
+ | {| class="wikitable" style="text-align:center:width:25% border="1" | ||
+ | |- | ||
+ | |colspan=2 rowspan=2| || colspan=2|Gold standard | ||
+ | |- | ||
+ | |Condition Positive||Condition negative | ||
+ | |- | ||
+ | |rowspan=2| Result of New Test|| Test Positive ||80||70 | ||
+ | |- | ||
+ | |Test Negative ||10 ||240 | ||
+ | |- | ||
+ | |} | ||
+ | </center> | ||
+ | |||
+ | What is the sensitivity of the new test? | ||
+ | :a. 77% | ||
+ | :b. 89% | ||
+ | :c. 80% | ||
+ | :d. 53% | ||
+ | What is the specificity of the test? | ||
+ | :a. 77% | ||
+ | :b. 89% | ||
+ | :c. 80% | ||
+ | :d. 53% | ||
+ | What is the positive value of the test? | ||
+ | :a. 77% | ||
+ | :b. 89% | ||
+ | :c. 80% | ||
+ | :d. 53% | ||
+ | Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing. | ||
+ | :a. What type of study is this? | ||
+ | :b. Why is this type of study adequate for this particular situation? | ||
+ | :c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design? | ||
+ | :d. What is the best measure of association to test the relationship between hand washing and incident flu? Why? | ||
+ | :e. Calculate and interpret the above measure of association using a 2X2 table. | ||
+ | :f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings. | ||
+ | Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use. | ||
+ | :a. What type of study design was used in this example? | ||
+ | :b. Why is this type of study appropriate for this particular situation? | ||
+ | :c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? | ||
+ | :d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use. | ||
+ | :e. What is the appropriate measure of association for this study? Explain why. | ||
+ | :f. Calculate and interpret your measure of association. | ||
+ | A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants. | ||
+ | :a. What type of study is this? | ||
+ | :b. What type of measure of association is appropriate for this study? Why? | ||
+ | :c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why? | ||
+ | Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias? | ||
+ | :a. Interviewer bias | ||
+ | :b. Recall bias | ||
+ | :c. Loss to follow-up | ||
+ | :d. Non-differential misclassification | ||
+ | Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of: | ||
+ | :a. Interviewer bias | ||
+ | :b. Loss to follow-up | ||
+ | :c. Differential misclassfication | ||
+ | :d. Non-differential misclassification | ||
+ | ===References=== | ||
+ | *[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia] | ||
Latest revision as of 08:11, 27 April 2015
Contents
- 1 Scientific Methods for Health Sciences - Introduction to Epidemiology
- 1.1 Overview
- 1.2 Motivation
- 1.3 Theory
- 1.4 Randomized Controlled Trials (RCT)
- 1.5 Cohort Study
- 1.6 Case Control Study
- 1.7 Cross-Sectional Studies
- 1.8 Ecologic Studies
- 1.9 Other Risk Estimates
- 1.10 Bias
- 1.11 Applications
- 1.12 Software
- 1.13 Problems
- 1.14 References
Scientific Methods for Health Sciences - Introduction to Epidemiology
Overview
Epidemiology is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.
Motivation
In this introduction to epidemiology, we will:
- Study the language of epidemiology and identify key sources of data for epidemiological purposes
- Be able to calculate and interpret measures of disease frequency
- Recognize and evaluate epidemiological study designs and their limitations
- Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).
Theory
- Five main goals of epidemiology:
- To identify the cause of disease and its risk factors
- To determine the extent of disease found in the community
- To study the natural history and prognosis of disease
- To evaluate new preventative and therapeutic measures
- To provide a foundation for developing public policy
- Distinguishing between endemic, epidemic, and pandemic:
- Endemic: The habitual presence (or usual occurrence) of a disease within a given geographic area;
- Epidemic: The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;
- Pandemic: A worldwide epidemic affecting an exceptionally high proportion of the global population.
- Modes of Disease Transmission
- Direct contact: Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse
- Indirect contact: Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector
- Inanimate (object or vehicle): Examples may be toy, food or water
- Vector-borne (animal or insect): Examples include mosquitoes, ticks and mice
- Attack Rates and Ratios (ARR)
- Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves:
- Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)
- Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)
- Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)
- Attack Rates (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$
- Attack Rate Ratio (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$
- $H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.
Measuring Disease
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.
- Incidence: number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.
- Cumulative incidence: $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $
- Incidence rate: $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.
- Prevalence: $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$
- The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.
Measuring Mortality Rates
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:
- All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$
- Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$
- Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$
Additional Measures of Mortality
- Infant mortality: $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$
- Proportionate mortality: Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause
- Case fatality: Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period
- Underlying cause of death
Direct and Indirect Adjustment of Rates
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.
- Direct age-adjustment: Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.
For each population:
- Calculate age-specific rates
- Multiply age-specific rates by the # of people in corresponding age range in standard population
- Sum expected # of deaths across age groups
- Divide total # of expected deaths by total standard population
Age-adjusted mortality rate for each population of interest
- Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the standardized mortality rate (SMR). It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).
- Acquire age-specific mortality rates for standard population
- Multiply standard population’s age-specific rates by # of people in age range in study population
- Sum expected # of deaths across age groups in study population
- Divide observed # of deaths by expected # of deaths in study population
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)
Screening
Screening is the use of testing to sort out apparently well persons (asymptomatic) who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include:
- Fasting blood sugar for diabetes
- Bone densitometry for osteoporosis
- Otoacoustic emissions testing for hearing loss in newborns
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening.
Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests
If a patient is tested positive, the likelihood that they actually have the disease is called Positive Predictive Value (PPV). If a patient tests negative, the likelihood they actually do not have the disease is called Negative Predictive Value (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.
Disease Status | |||
Disease | No Disease | ||
Screening Test | Positive | a (True positives) | b (False positives) |
Negative | c (False negatives) | d (True negatives) |
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$
PPV interpretation: Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.
NPV interpretation: Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.
Factors Influence Predictive Values
Disease prevalence: Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.
- Test specificity (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.
- Test sensitivity (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$
Note: The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.
Validity
Validity: The ability of a test to distinguish between who has disease and who does not
Reliability: The ability to replicate results on same sample if test if repeated
The following charts shows the three possible outcomes (from left to right): valid not reliable, reliable not valid, and valid and reliable.
Reliability (repeatability) of tests
Can the results be replicated if the test is redone? The results may be influenced by three factors:
- Intrasubject variation: Variation within individual subjects
- Intraobserver variation: Variation in reading of results by the same reader
- Interobserver variation: Variation between those reading results
How do multiple testing improve screening programs?
Using multiple tests:
- Sequential tests (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.
- Simultaneous tests (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests.
Each test has own sensitivity and specificity. Utilization of multiple testing can improve net sensitivity (simultaneous testing) or net specificity (sequential testing). In other words:
- Sequential testing decreases net sensitivity and increases net specificity
- Simultaneous testing increases net sensitivity and decreases net specificity
Randomized Controlled Trials (RCT)
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the treatment group) and those who were not (i.e., the comparison group). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."
Steps of a RCT
RCTs involve the following sequential steps:
- Hypothesis formulation
- Study participant recruitment based on specific criteria
- Gathering informed consent
- Allocation of eligible and willing participants into random assignment study groups
- Monitoring study groups for outcome under study
- Comparing rates of different outcomes in various groups
External and internal validity
- External validity: Generalization of study to larger source population, which is influenced by factors like:
- Demographic differences between eligible and ineligible subgroups
- Intervention mirror what will happen in the community or source population
- Internal validity: Ability to reach correct conclusion in study, which is influenced by factors like:
- Ability of subjects to provide valid and reliable data
- Expected compliance with a regimen
- Low probability of dropping out
Measures of Association and Effect in RCT
Ratio of two measures of disease incidence (relative measures):
- Risk Ratio (Relative Risk)
- Rate Ratio
Difference between two measures of disease incidence:
- Risk difference
- Efficacy
Disease Status | |||
Disease | No Disease | ||
Treatment | Drug A | a | b |
Placebo | c | d |
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$
Interpretation:
- $RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B
- $RR=1$, Null value (no difference between groups)
- $RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk
- Situations that favor the use of RCT:
- Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.
- Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.
- Effect of intervention on outcome is of sufficient importance to justify a large study.
Cohort Study
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group.
- Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.
- Types:
- Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected
- Retrospective has benefits: more cost effective; good for disease of long latency
- Prospective has benefits: data quality presumably higher
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.
- Measures of Association in Cohort Study:
- Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio
- Difference between two measures of disease incidence: Risk Difference, Rate Difference
- Strengths and weakness of Cohort Design:
- Strengths:
- Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information
- Excellent for studying known adverse exposures or those that cannot practically be randomized
- Like RCT, excellent for studying rare exposures
- Multiple outcomes and sometimes multiple exposures can be studied
- Disadvantages:
- Long-term follow-up required and expensive
- Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop
- Loss to follow-up can be a problem
- Changes over time in criteria and methods can lead to problems with inferences
- People self-select exposures so exposed and unexposed may differ with respect to important characteristics
- Situations favor a Cohort Study:
- When there is evidence of an association between the exposure and the disease from other studies
- When the exposure is rare but incidence of disease among the exposure is high
- When time between exposure and development of the disease is relatively short or historical data is available
- When good follow-up can be ensured
Case Control Study
A case control study compares cases and controls to see which group has greater exposure to the disease.
- Measures of Association: Odds Ratio
Case | Control | ||
Exposed | Yes | a | b |
No | c | d |
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$
Interpretation
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).
- Strengths and weakness of Case Control Study:
- Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.
- Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.
- Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include:
- Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information
- Using existing information if/when possible (e.g. medical record)
- Masking participants to study hypothesis
- Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:
- When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn
- When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn
- When the disease being studied does not occur frequently
Cross-Sectional Studies
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).
Strengths and Limitations of Cross-Sectional Studies
- Strengths:
- Good for generating hypotheses
- Easily sets up other analytic designs
- Temporality is not a problem for time invariant exposures (genetic markers)
- Relatively low cost
- Weakness:
- Temporality – exposure or disease which happened first
- Prevalent cases may not be the same as incident cases
- Not useful for rare disease
- Subject to selection bias
Measures of Association in Cross Sectional Studies
Case | Control | ||
Exposed | Yes | a | b |
No | c | d |
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$
Ecologic Studies
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level.
Strengths and Disadvantages of Ecologic Studies
- Strengths:
- Data is relatively easy and/or cheap to obtain.
- Ecological studies are a good place to start.
- Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.
- Weaknesses:
- Reliance on group-level data may not correctly represent individual-level associations.
- Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.
- Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.
Other Risk Estimates
- Attributable Risk Estimates of Effect: If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.
- Attributable Risk ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure.
- Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$
- Population Attributable Risk ($PAR$): $PAR= CI_{Total} - CI_{Not exposed}$
- Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.
Bias
Bias is a barrier to internal validity.
- Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results.
- Impact of bias: Makes it appear as if there is an association when there really is none (bias away form the null); masks an association when there really is one (bias toward the null).
- Reasons we get wrong answers: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.
- Mechanisms to reduce bias:
- Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).
- Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.
- Information bias: The quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.
- Confounding bias: Differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder.
- Chance: The luck of draw gets you a study sample that is not representative of the larger population.
- Strategies to handle confounding: (1) In study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study:
Control Exposed | Control Unexposed | |
Case Exposed | a | b |
Case Unexposed | c | d |
- Concordant pairs: Both case and control exposed; neither case nor control exposed.
- Discordant pairs: Case exposed but control not exposed; control exposed but case not exposed.
- Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}$
Interpretation: If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.
- Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.
- Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant.
- Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure.
An example of age-adjustment:
Applications
- This article reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.
- This article, studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.
Software
Problems
How do we learn about existence of outbreaks?
- a. Cases call health departments directly
- b. Clinicians
- c. Laboratories
- d. All of the above
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?
- a. Host
- b. Agent
- c. Vector
- d. Environment
- e. All of the above
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?
- a. 0.002 lung cancer cases per 100,000 person years
- b. 200 lung cancer cases per 100,000 person years
- c. 270 lung cancer cases per 100,000 person years
- d. 243 lung cancer cases per 100,000 person years
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?
- a. The prevalence increases if the duration of disease is increasing or stays the same.
- b. The prevalence increases if the duration of disease is decreasing rapidly.
- c. The prevalence decreases if the duration of disease is increasing.
- d. The prevalence decreases if the duration of disease stays the same.
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.
Age groups (years) | Age-specific rates (per 100,000) | Michigan standard population | Expected number of deaths |
<20 | 20 | 2,000,000 | |
20-39 | 10 | 3,000,000 | |
40-59 | 5 | 1,000,000 | |
>60 | 30 | 4,000,000 | |
Total | 10,000,000 |
What is the age-adjusted mortality rate from diabetes among whites according to the table above?
- a. 40.2 deaths per 100,000
- b. 19.5 deaths per 100,000
- c. 1.9 death per 100,000
- d. 20.4 deaths per 100,000
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?
- a. 1.54
- b. 5.02
- c. 1.69
- d. 0.65
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.
- a. True
- b. False
Sequential testing tends to have higher net specificity than specificity of a single test.
- a. True
- b. False
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:
Gold standard | |||
Condition Positive | Condition negative | ||
Result of New Test | Test Positive | 80 | 70 |
Test Negative | 10 | 240 |
What is the sensitivity of the new test?
- a. 77%
- b. 89%
- c. 80%
- d. 53%
What is the specificity of the test?
- a. 77%
- b. 89%
- c. 80%
- d. 53%
What is the positive value of the test?
- a. 77%
- b. 89%
- c. 80%
- d. 53%
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.
- a. What type of study is this?
- b. Why is this type of study adequate for this particular situation?
- c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?
- d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?
- e. Calculate and interpret the above measure of association using a 2X2 table.
- f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.
- a. What type of study design was used in this example?
- b. Why is this type of study appropriate for this particular situation?
- c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study?
- d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.
- e. What is the appropriate measure of association for this study? Explain why.
- f. Calculate and interpret your measure of association.
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.
- a. What type of study is this?
- b. What type of measure of association is appropriate for this study? Why?
- c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?
- a. Interviewer bias
- b. Recall bias
- c. Loss to follow-up
- d. Non-differential misclassification
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:
- a. Interviewer bias
- b. Loss to follow-up
- c. Differential misclassfication
- d. Non-differential misclassification
References
- SOCR Home page: http://www.socr.umich.edu
Translate this page: