Difference between revisions of "SMHS IntroEpi"

Latest revision as of 08:11, 27 April 2015

Scientific Methods for Health Sciences - Introduction to Epidemiology

Overview

Epidemiology is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.

Motivation

In this introduction to epidemiology, we will:

Study the language of epidemiology and identify key sources of data for epidemiological purposes
Be able to calculate and interpret measures of disease frequency
Recognize and evaluate epidemiological study designs and their limitations
Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).

Theory

Five main goals of epidemiology:

To identify the cause of disease and its risk factors
To determine the extent of disease found in the community
To study the natural history and prognosis of disease
To evaluate new preventative and therapeutic measures
To provide a foundation for developing public policy

Distinguishing between endemic, epidemic, and pandemic:

Endemic: The habitual presence (or usual occurrence) of a disease within a given geographic area;
Epidemic: The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;
Pandemic: A worldwide epidemic affecting an exceptionally high proportion of the global population.

Modes of Disease Transmission

Direct contact: Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse
Indirect contact: Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector
1. Inanimate (object or vehicle): Examples may be toy, food or water
2. Vector-borne (animal or insect): Examples include mosquitoes, ticks and mice

Attack Rates and Ratios (ARR)

Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves:

Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)
Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)
Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)

Attack Rates (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$

Attack Rate Ratio (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$

$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.

Measuring Disease

To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.

Incidence: number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.

Cumulative incidence: $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $

Incidence rate: $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$

Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.

Prevalence: $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$

The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.

Measuring Mortality Rates

To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:

All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$

Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$

Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$

Additional Measures of Mortality

Infant mortality: $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$

Proportionate mortality: Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause

Case fatality: Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period

Underlying cause of death

Direct and Indirect Adjustment of Rates

Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.

Direct age-adjustment: Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.

For each population:

Calculate age-specific rates
Multiply age-specific rates by the # of people in corresponding age range in standard population
Sum expected # of deaths across age groups
Divide total # of expected deaths by total standard population

Age-adjusted mortality rate for each population of interest

Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the standardized mortality rate (SMR). It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).

Acquire age-specific mortality rates for standard population
Multiply standard population’s age-specific rates by # of people in age range in study population
Sum expected # of deaths across age groups in study population
Divide observed # of deaths by expected # of deaths in study population

Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)

Screening

Screening is the use of testing to sort out apparently well persons (asymptomatic) who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include:

Fasting blood sugar for diabetes
Bone densitometry for osteoporosis
Otoacoustic emissions testing for hearing loss in newborns

Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening.

Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests

If a patient is tested positive, the likelihood that they actually have the disease is called Positive Predictive Value (PPV). If a patient tests negative, the likelihood they actually do not have the disease is called Negative Predictive Value (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.

		Disease Status
		Disease	No Disease
Screening Test	Positive	a (True positives)	b (False positives)
Screening Test	Negative	c (False negatives)	d (True negatives)

$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$

PPV interpretation: Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.

NPV interpretation: Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.

See the section on McNemar Test.

Factors Influence Predictive Values

Disease prevalence: Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.

Test specificity (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.
Test sensitivity (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$

Note: The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.

Validity

Validity: The ability of a test to distinguish between who has disease and who does not

Reliability: The ability to replicate results on same sample if test if repeated

The following charts shows the three possible outcomes (from left to right): valid not reliable, reliable not valid, and valid and reliable.

Reliability (repeatability) of tests

Can the results be replicated if the test is redone? The results may be influenced by three factors:

Intrasubject variation: Variation within individual subjects
Intraobserver variation: Variation in reading of results by the same reader
Interobserver variation: Variation between those reading results

How do multiple testing improve screening programs?

Using multiple tests:

Sequential tests (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.
Simultaneous tests (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests.

Each test has own sensitivity and specificity. Utilization of multiple testing can improve net sensitivity (simultaneous testing) or net specificity (sequential testing). In other words:

Sequential testing decreases net sensitivity and increases net specificity
Simultaneous testing increases net sensitivity and decreases net specificity

Randomized Controlled Trials (RCT)

In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the treatment group) and those who were not (i.e., the comparison group). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."

Steps of a RCT

RCTs involve the following sequential steps:

Hypothesis formulation
Study participant recruitment based on specific criteria
Gathering informed consent
Allocation of eligible and willing participants into random assignment study groups
Monitoring study groups for outcome under study
Comparing rates of different outcomes in various groups

External and internal validity

External validity: Generalization of study to larger source population, which is influenced by factors like:

Demographic differences between eligible and ineligible subgroups
Intervention mirror what will happen in the community or source population

Internal validity: Ability to reach correct conclusion in study, which is influenced by factors like:

Ability of subjects to provide valid and reliable data
Expected compliance with a regimen
Low probability of dropping out

Measures of Association and Effect in RCT

Ratio of two measures of disease incidence (relative measures):

Risk Ratio (Relative Risk)
Rate Ratio

Difference between two measures of disease incidence:

Risk difference
Efficacy

		Disease Status
		Disease	No Disease
Treatment	Drug A	a	b
Treatment	Placebo	c	d

$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$

$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$

Interpretation:

$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B
$RR=1$, Null value (no difference between groups)
$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk

$Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$

Situations that favor the use of RCT:

Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.
Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.
Effect of intervention on outcome is of sufficient importance to justify a large study.

Cohort Study

Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group.

Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.

Types:
- Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected
- Retrospective has benefits: more cost effective; good for disease of long latency
- Prospective has benefits: data quality presumably higher

Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.

Measures of Association in Cohort Study:
- Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio
- Difference between two measures of disease incidence: Risk Difference, Rate Difference

Strengths and weakness of Cohort Design:

Strengths:

Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information
Excellent for studying known adverse exposures or those that cannot practically be randomized
Like RCT, excellent for studying rare exposures
Multiple outcomes and sometimes multiple exposures can be studied

Disadvantages:

Long-term follow-up required and expensive
Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop
Loss to follow-up can be a problem
Changes over time in criteria and methods can lead to problems with inferences
People self-select exposures so exposed and unexposed may differ with respect to important characteristics

Situations favor a Cohort Study:

When there is evidence of an association between the exposure and the disease from other studies
When the exposure is rare but incidence of disease among the exposure is high
When time between exposure and development of the disease is relatively short or historical data is available
When good follow-up can be ensured

Case Control Study

A case control study compares cases and controls to see which group has greater exposure to the disease.

Measures of Association: Odds Ratio

		Case	Control
Exposed	Yes	a	b
Exposed	No	c	d

$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$

Interpretation

Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).

Strengths and weakness of Case Control Study:
- Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.
- Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.

Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include:

Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information
Using existing information if/when possible (e.g. medical record)
Masking participants to study hypothesis

Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:

When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn
When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn
When the disease being studied does not occur frequently

Cross-Sectional Studies

A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).

Strengths and Limitations of Cross-Sectional Studies

Strengths:

Good for generating hypotheses
Easily sets up other analytic designs
Temporality is not a problem for time invariant exposures (genetic markers)
Relatively low cost

Weakness:

Temporality – exposure or disease which happened first
Prevalent cases may not be the same as incident cases
Not useful for rare disease
Subject to selection bias

Measures of Association in Cross Sectional Studies

		Case	Control
Exposed	Yes	a	b
Exposed	No	c	d

$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$

Ecologic Studies

An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level.

Strengths and Disadvantages of Ecologic Studies

Strengths:

Data is relatively easy and/or cheap to obtain.
Ecological studies are a good place to start.
Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.

Weaknesses:

Reliance on group-level data may not correctly represent individual-level associations.
Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.
Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.

Other Risk Estimates

Attributable Risk Estimates of Effect: If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.
Attributable Risk ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure.
Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$
Population Attributable Risk ($PAR$): $PAR= CI_{Total} - CI_{Not exposed}$
Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.

Bias

Bias is a barrier to internal validity.

Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results.
Impact of bias: Makes it appear as if there is an association when there really is none (bias away form the null); masks an association when there really is one (bias toward the null).
Reasons we get wrong answers: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.

Mechanisms to reduce bias:
- Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).
- Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.
Information bias: The quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.
Confounding bias: Differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder.
Chance: The luck of draw gets you a study sample that is not representative of the larger population.
Strategies to handle confounding: (1) In study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study:

	Control Exposed	Control Unexposed
Case Exposed	a	b
Case Unexposed	c	d

Concordant pairs: Both case and control exposed; neither case nor control exposed.
Discordant pairs: Case exposed but control not exposed; control exposed but case not exposed.
Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}$

Interpretation: If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.

Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.
Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant.
Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure.

An example of age-adjustment:

Applications

This article reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.

This article, studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.

Software

Problems

How do we learn about existence of outbreaks?

a. Cases call health departments directly

b. Clinicians

c. Laboratories

d. All of the above

In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?

a. Host

b. Agent

c. Vector

d. Environment

e. All of the above

The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?

a. 0.002 lung cancer cases per 100,000 person years

b. 200 lung cancer cases per 100,000 person years

c. 270 lung cancer cases per 100,000 person years

d. 243 lung cancer cases per 100,000 person years

In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?

a. The prevalence increases if the duration of disease is increasing or stays the same.

b. The prevalence increases if the duration of disease is decreasing rapidly.

c. The prevalence decreases if the duration of disease is increasing.

d. The prevalence decreases if the duration of disease stays the same.

Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.

Age groups (years)	Age-specific rates (per 100,000)	Michigan standard population	Expected number of deaths
<20	20	2,000,000
20-39	10	3,000,000
40-59	5	1,000,000
>60	30	4,000,000
Total		10,000,000

What is the age-adjusted mortality rate from diabetes among whites according to the table above?

a. 40.2 deaths per 100,000

b. 19.5 deaths per 100,000

c. 1.9 death per 100,000

d. 20.4 deaths per 100,000

Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?

a. 1.54

b. 5.02

c. 1.69

d. 0.65

When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.

a. True

b. False

Sequential testing tends to have higher net specificity than specificity of a single test.

a. True

b. False

A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:

		Gold standard
		Condition Positive	Condition negative
Result of New Test	Test Positive	80	70
Result of New Test	Test Negative	10	240

What is the sensitivity of the new test?

a. 77%

b. 89%

c. 80%

d. 53%

What is the specificity of the test?

a. 77%

b. 89%

c. 80%

d. 53%

What is the positive value of the test?

a. 77%

b. 89%

c. 80%

d. 53%

Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.

a. What type of study is this?

b. Why is this type of study adequate for this particular situation?

c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?

d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?

e. Calculate and interpret the above measure of association using a 2X2 table.

f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.

Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.

a. What type of study design was used in this example?

b. Why is this type of study appropriate for this particular situation?

c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study?

d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.

e. What is the appropriate measure of association for this study? Explain why.

f. Calculate and interpret your measure of association.

A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.

a. What type of study is this?

b. What type of measure of association is appropriate for this study? Why?

c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?

Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?

a. Interviewer bias

b. Recall bias

c. Loss to follow-up

d. Non-differential misclassification

Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:

a. Interviewer bias

b. Loss to follow-up

c. Differential misclassfication

d. Non-differential misclassification

References

Epidemiology Wikipedia

SOCR Home page: http://www.socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 2: / Line 2: @@
 ===Overview===
-Epidemiology is the study of the distribution and determinants of disease frequency in human populations. It serves as an important area in the scientific field: it is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. The introduction to Epidemiology aims to introduce the filed of Epidemiology and study the basic concepts and methodologies we are going to apply later. It also aims to help students solve and analyze Epidemiological problems and introduce students to various Epidemiological studies.
+[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.
 ===Motivation===
-To get an introduction to Epidemiology, we want to:
+In this introduction to epidemiology, we will:
-*study on the basis of the language of epidemiology and identify key sources of data for epidemiologic purposes
+*Study the language of epidemiology and identify key sources of data for epidemiological purposes
-*be able to calculate and interpret measures of disease frequency
+*Be able to calculate and interpret measures of disease frequency
-*recognize and evaluate epidemiological study designs and their limitations
+*Recognize and evaluate epidemiological study designs and their limitations
-*be an informed consumer of  epidemiological sources of information (journals, websites, government agencies).
+*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).
 ===Theory===
-==Five main goals of epidemiology==
+*Five main goals of epidemiology:
-*To identify the cause of disease and its risk factors
+# To identify the cause of disease and its risk factors
-*To determine the extent of disease found in the community
+# To determine the extent of disease found in the community
-*To study the natural history and prognosis of disease
+# To study the natural history and prognosis of disease
-*To evaluate new preventative and therapeutic measures
+# To evaluate new preventative and therapeutic measures
-*To provide a foundation for developing public policy.
+# To provide a foundation for developing public policy
-==Distinguishing between Endemic, Epidemic, and Pandemic==
+*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':
-*Endemic: The habitual presence (or usual occurrence) of a disease within a given geographic area;
+#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;
-*Epidemic: The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;
+#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;
-*Pandemic: A worldwide epidemic affecting an exceptionally high proportion of the global population.
+#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.
-==Modes of Disease Transmission==
+*Modes of Disease Transmission
-*Direct contact: transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse.
+#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse
-*Indirect contact: transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector. (1) Inanimate object vehicle), examples may be toy, food or water; (2) Vector-borne (animal or insect), examples include mosquito, tick and mice.
+#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector
+##''Inanimate (object or vehicle)'': Examples may be toy, food or water
+##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice
-==Attack Rates and Ratios (ARR)==
+*Attack Rates and Ratios (ARR)
-Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak involves: starting with the big picture and big risk factors for disease such as “How many people at the event got ill?”; refining the big picture into smaller questions of “Did they eat the salad? Chicken? Or ice cream?”; formulating a hypothesis such as “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”
+:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves:
+#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)
+#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)
+#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)
-*Attack Rates (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\,  illness} {Total\,number\,of\,people\,at\,risk}.$
+:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\,  illness} {Total\,number\,of\,people\,at\,risk}$
+:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$
+*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.
-*Attack Rate Ratio (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}.$
+====Measuring Disease====
+To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.
-*$H_{0}:ARR=1$,and 95% confidence intervals can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those unexposed.
+*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.
+*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $
-==Measuring Disease==
+*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$
-To name and calculate two measures of incidence and describe differences in interpreting these measures as well as to understand the difference of the difference between proportion and a true rate.
+Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.
-*Incidence: number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.
+*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$
-**Cumulative incidence: $\frac{Number\,of\,new\,cases} {Total\population\at\risk}$.
+*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.
-**Incidence rate: $\frac{Number\of\new\cases}{Total\,person-time\contributed\by\the\persons\followed}$.
+====Measuring Mortality Rates====
+To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:
-Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days = 3 + 5 + 8 = 16.
+*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$
+*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$
-*Prevalence
+*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$
-**$\frac{Number\of\cases\of\a\disease\in\the\population\at\a\specified\time}{Number\of\persons\in\the\population\at\that\time}$.
-**The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3^{rd}, 2013.
+====Additional Measures of Mortality====
+*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$
-==Measuring Mortality Rates==
+*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause
-*To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates.
+*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period
+*''Underlying cause of death''
-*All cause mortality rates=$\frac{Number\of\deaths\in\a\specified\time\period}{Number\in\population\in\the\middle\of\the\year}$.
+====Direct and Indirect Adjustment of Rates====
+Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.
+*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.
-*Cause-specific mortality rate=$\frac{Total\number\of\deaths\in\1\year\from\lung\cancer\in\US}{Population\of\the\US\in\the\middle\of\the\year}$.
-*Group-specific mortality rate=$\frac{Total\number\of\deaths\in\1\year\among \women\in\US} {Female\population\of\the\US\in\the\middle\of\the\year}$.
-==Additional Measures of Mortality==
-**Infant mortality: $\frac{Number\of\deaths\in\children\under\1\year\of\age\in\, 2011} {(Number\of\live\births\in\2011}$.
-**Proportionate mortality: measures proportion of all deaths occurring in a given place over a given time that is due to a given cause.
-**Case fatality: Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period.
-**Underlying cause of death.
-==Direct and Indirect Adjustment of Rates==
-Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjust for age to compare the mortality rates in two populations if they both have the same age distribution.
-*Direct age-adjustment: expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.
 For each population:
-:1. Calculate age-specific rates
-:2. Multiply age-specific rates by the # of people in corresponding age range in standard population
-:3. Sum expected # of deaths across age groups
-:4. Divide total # of expected deaths by total standard population
-Age-adjusted mortality rate for each population of interest.
+# Calculate age-specific rates
+# Multiply age-specific rates by the # of people in corresponding age range in standard population
+# Sum expected # of deaths across age groups
+# Divide total # of expected deaths by total standard population
-*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the standardized mortality rate (SMR). It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).
+====Age-adjusted mortality rate for each population of interest====
+*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).
+# Acquire age-specific mortality rates for standard population
+# Multiply standard population’s age-specific rates by # of people in age range in study population
+# Sum expected # of deaths across age groups in study population
+# Divide observed # of deaths by expected # of deaths in study population
-:1. Acquire age-specific mortality rates for standard population
+Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)
-:2. Multiply standard population’s age-specific rates by # of people in age range in study population
-:3. Sum expected # of deaths across age groups in study population
-:4. Divide observed # of deaths by expected # of deaths in study population
-Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)
+====Screening====
+''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include:
-==Screening==
+*Fasting blood sugar for diabetes
+*Bone densitometry for osteoporosis
+*Otoacoustic emissions testing for hearing loss in newborns
-Screening is the use of testing to sort out apparently well persons (asymptomatic) who probably have disease from those who probably do not and allows to detect the disease early. Examples of screening include: fasting blood sugar for diabetes, bone densitometry for osteoporosis and Otoacoustic emissions testing for hearing loss new borns. It is done during the preclinical phase and is a secondary prevention strategy. Screening increases lead time, thereby allows us to detect disease early, initiate treatment sooner and provide better outcomes. However, it is critical that screening programs must be warranted and there must be a critical point that can be preceded by screening.
+Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening.
+=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====
-'''A. Clinical utility predictive value & reliability: clinical utility of positive tests.'''
+If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.
-If a patient is tested positive, the likelihood they actually have the disease is called '''Positive Predictive Value (PPV'''), if a patient tests negative, the likelihood they actually do not have the disease is called '''Negative Predictive Value (NPV).''' PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.
+<center>
 {|class="wikitable" style="text align:center;width:25%"border="1"
 |-
@@ Line 123: / Line 121: @@
 | Negative ||	c (False negatives)||	d (True negatives)
 |}
-<center>
 $PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$
 </center>
+'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.
+'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.
+* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].
-PPV interpretation: Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.
+===== Factors Influence Predictive Values=====
-NPV interpretation: Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.
+''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.
+*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.
+*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$
-'''B. Factors influence predictive values''':
+'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.
-*Disease prevalence: increasing disease prevalence increases PPV (or decreases NPV). Screening program most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.
-*Test specificity (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): higher test specificity increases PPV.
-*Test sensitivity (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c}).
-'''Note:''' the cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.
+=====Validity=====
-'''C. Validity:''' validity is the ability of a test to distinguish between who has disease and who does not; reliability is the ability to replicate results on same sample if test if repeated. The following charts shows the three possible outcomes: (from left to right) valid not reliable, reliable not valid and valid and reliable.
+''Validity'': The ability of a test to distinguish between who has disease and who does not
+''Reliability'': The ability to replicate results on same sample if test if repeated
+The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.
 <center>
@@ Line 148: / Line 151: @@
 </center>
+=====Reliability (repeatability) of tests=====
-'''D. Reliability(repeatability) of tests:'''
 Can the results be replicated if the test is redone? The results may be influenced by three factors:
-*Intrasubject variation: variation within individual subjects
-*Intraobserver variation: variation in reading of results by the same reader
-*Interobserver variation: variation between those reading results
+*''Intrasubject variation'': Variation within individual subjects
+*''Intraobserver variation'': Variation in reading of results by the same reader
+*''Interobserver variation'': Variation between those reading results
-'''E. How do multiple testing improve screening programs?'''
+=====How do multiple testing improve screening programs?=====
+Using multiple tests:
-Using multiple tests:
+# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.
+# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests.
-:(1) sequential tests(2-stage) is less expensive, less invasive, less uncomfortable test first; if positive on first test, then follow-up with additional testing.
-:(2) simultaneous tests (parallel) conduct multiple screening tests at the same time; to be considered positive, the person can test positive on either test, to be considered negative, the person must test negative on all tests.
+Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:
-Each test has own sensitivity and specificity. Utilization of multiple testing can improve net sensitivity (simultaneous testing) or net specificity (sequential testing), that is sequential testing decreases net sensitivity and increases net specificity while simultaneous testing increases net sensitivity and decreases net specificity.
+*Sequential testing decreases net sensitivity and increases net specificity
+*Simultaneous testing increases net sensitivity and decreases net specificity
+===Randomized Controlled Trials (RCT)===
+In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."
-==Randomized Controlled Trials (RCT):==
+====Steps of a RCT====
+RCTs involve the following sequential steps:
-The investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatment” is that the compared groups are the same except for the “treatment”.
+#Hypothesis formulation
+#Study participant recruitment based on specific criteria
-'''*Steps of a RCT:''' hypothesis formed; study participant recruited based on specific criteria and their informed consent is sought; eligible and willing participants randomly allocated to receive assignment to a particular study group; study groups are monitored for outcome under study; rates of outcome in the various groups are compared.
+#Gathering informed consent
+#Allocation of eligible and willing participants into random assignment study groups
+#Monitoring study groups for outcome under study
+#Comparing rates of different outcomes in various groups
 <center>
-[[Image:MSHS_IntroEpi_Fig_3_actually2.png]]
+[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]
 </center>
+====External and internal validity====
-'''External and internal validity:'''
+*''External validity'': Generalization of study to larger source population, which is influenced by factors like:
+:*Demographic differences between eligible and ineligible subgroups
-*External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.
+:*Intervention mirror what will happen in the community or source population
-*Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.
+*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like:
+:*Ability of subjects to provide valid and reliable data
+:*Expected compliance with a regimen
+:*Low probability of dropping out
-'''Measures of Association and Effect in RCT:'''
+====Measures of Association and Effect in RCT====
-Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.
+Ratio of two measures of disease incidence (relative measures):
-Difference between two measures of disease incidence: Risk difference, efficacy.
+*Risk Ratio (Relative Risk)
+*Rate Ratio
+Difference between two measures of disease incidence:
+*Risk difference
+*Efficacy
 <center>
@@ Line 204: / Line 223: @@
 |}
 </center>
+$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$
+<center>
+$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$
+</center>
-$ Relative\,Risk={Cumulative\,Incidence\in\exposed} {Cumulative\Incidence\in\unexposed}=ratio\of\risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_drugA}{CI_placebo}$
+'''Interpretation''':
-$Rate Ratio=\frac{Incidence\rate\in\exposed\} {Incidence\rate\in\unexposed}$
+*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B
+*$RR=1$, Null value (no difference between groups)
+*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk
+<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$
+</center>
-Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.
+*Situations that favor the use of RCT:
+# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.
+# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.
+# Effect of intervention on outcome is of sufficient importance to justify a large study.
+===Cohort Study===
+Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group.
+*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.
+[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]
+*Types:
+**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected
+**Retrospective has benefits: more cost effective; good for disease of long latency
+**Prospective has benefits: data quality presumably higher
-$Efficacy=\frac{C.\,I.\,in\,placebo-C.I.\,rate\,in\,the\,treatment} {C.I.Rate\,in\,placebo\,group}$
+Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.
+*Measures of Association in Cohort Study:
+**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio
+**Difference between two measures of disease incidence: Risk Difference, Rate Difference
-*Situations that favor the use of RCT:
+*Strengths and weakness of Cohort Design:
+: Strengths:
+# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information
+# Excellent for studying known adverse exposures or those that cannot practically be randomized
+# Like RCT, excellent for studying rare exposures
+# Multiple outcomes and sometimes multiple exposures can be studied
+: Disadvantages:
+# Long-term follow-up required and expensive
+# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop
+# Loss to follow-up can be a problem
+# Changes over time in criteria and methods can lead to problems with inferences
+# People self-select exposures so exposed and unexposed may differ with respect to important characteristics
-(1) Exposure of interest is a modifiable factor over which individuals are willing to relinquish control;
+*Situations favor a Cohort Study:
+# When there is evidence of an association between the exposure and the disease from other studies
+# When the exposure is rare but incidence of disease among the exposure is high
+# When time between exposure and development of the disease is relatively short or historical data is available
+# When good follow-up can be ensured
-(2) Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;
+===Case Control Study===
+A case control study compares cases and controls to see which group has greater exposure to the disease.
+*Measures of Association: Odds Ratio
+<center>
+{|class="wikitable" style="text align:center;width:25%"border="1"
+|-
+| colspan=2| || Case || Control
+|-
+|rowspan=2 |Exposed || Yes || a || b
+|-
+| No ||	c ||d
+|-
+|}
+</center>
+$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$
-(3) Effect of intervention on outcome is of sufficient importance to justify a large study.
+====Interpretation====
+Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).
+*Strengths and weakness of Case Control Study:
+**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.
+**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.
-.11) Cohort Study: Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group.
+*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include:
+# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information
-*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.
+# Using existing information if/when possible (e.g. medical record)
+# Masking participants to study hypothesis
+*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:
+# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn
+# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn
+# When the disease being studied does not occur frequently
-[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]
+===Cross-Sectional Studies===
+A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).
-*Types:
+====Strengths and Limitations of Cross-Sectional Studies====
-Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.
+* '''Strengths:'''
-Retrospective has benefits: more cost effective; good for disease of long latency.
+# Good for generating hypotheses
-Prospective has benefits: data quality presumably higher.
+# Easily sets up other analytic designs
-Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.
+# Temporality is not a problem for time invariant exposures (genetic markers)
+# Relatively low cost
-*Measures of Association in Cohort Study:
+*'''Weakness:'''
+# Temporality – exposure or disease which happened first
+# Prevalent cases may not be the same as incident cases
+# Not useful for rare disease
+# Subject to selection bias
-Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.
+====Measures of Association in Cross Sectional Studies====
-Difference between two measures of disease incidence: Risk Difference, Rate Difference.
+<center>
+{|class="wikitable" style="text align:center;width:25%"border="1"
+|-
+| colspan=2| || Case || Control
+|-
+|rowspan=2 |Exposed || Yes || a || b
+|-
+| No ||	c ||d
+|-
+|}
+$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$
+</center>
-*Strengths and weakness of Cohort Design:
+===Ecologic Studies===
-Strengths: (1) Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. (2) Excellent for studying known adverse exposures or those that cannot practically be randomized. (3) Like RCT, excellent for studying rare exposures. (4) Multiple outcomes and sometimes multiple exposures can be studied.
+An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level.
-Disadvantages: (1) Long-term follow-up required and expensive; (2) Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; (3) Loss to follow-up can be a problem; (4) Changes over time in criteria and methods can lead to problems with inferences; (5) People self-select exposures so exposed and unexposed may differ with respect to important characteristics.
-*Situations favor a Cohort Study:
-(1) When there is evidence of an association between the exposure and the disease from other studies;|(2) When the exposure is rare but incidence of disease among the exposure is high;|
-(3) When time between exposure and development of the disease is relatively short or historical data is available;
-(4) When good follow-up can be ensured.
+====Strengths and Disadvantages of Ecologic Studies====
+*'''Strengths:'''
+# Data is relatively easy and/or cheap to obtain.
+# Ecological studies are a good place to start.
+# Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.
+*'''Weaknesses:'''
+#Reliance on group-level data may not correctly represent individual-level associations.
+#Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.
+#Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.
+===Other Risk Estimates===
+*''Attributable Risk Estimates of Effect'': If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.
+*''Attributable Risk'' ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure.
+*''Attributable Risk Percent'' $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$
+*''Population Attributable Risk'' ($PAR$):    $PAR= CI_{Total} - CI_{Not exposed}$
+*''Population Attributable Risk Percent'' $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.
+===Bias===
+Bias is a barrier to internal validity.
+*''Causes of bias'': Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results.
+*''Impact of bias'': Makes it appear as if there is an association when there really is none (bias away form the null); masks an association when there really is one (bias toward the null).
+*''Reasons we get wrong answers'': Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.
+*Mechanisms to reduce bias:
+**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).
+**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.
+*''Information bias'': The quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.
+*''Confounding bias'': Differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder.
+*''Chance'': The luck of draw gets you a study sample that is not representative of the larger population.
+*''Strategies to handle confounding'': (1) In study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study:
+<center>
+{|class="wikitable" style="text align:center;width:25%"border="1"
+|-
+|  || Control Exposed || Control Unexposed
+|-
+| Case Exposed || a || b
+|-
+|Case Unexposed || c ||d
+|-
+|}
+</center>
+*''Concordant pairs'': Both case and control exposed; neither case nor control exposed.
+*''Discordant pairs'': Case exposed but control not exposed; control exposed but case not exposed.
+*''Matched analysis'': Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}$
+''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.
+*''Randomization'': Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.
+*''Stratification'': Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant.
+*''Adjustment'': A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure.
+An example of age-adjustment:
+[[Image:MSHS_IntroEpi_Fig4.png]]
+===Applications===
+* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.
+* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.
+===Software===
+*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]
+*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]
+===Problems===
+How do we learn about existence of outbreaks?
+:a. Cases call health departments directly
+:b. Clinicians
+:c. Laboratories
+:d. All of the above
+In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?
+:a. Host
+:b. Agent
+:c. Vector
+:d. Environment
+:e. All of the above
+The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010.  During that time period, 17,000 people were newly diagnosed with lung cancer.  What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?
+:a. 0.002 lung cancer cases per 100,000 person years
+:b. 200 lung cancer cases per 100,000 person years
+:c. 270 lung cancer cases per 100,000 person years
+:d. 243 lung cancer cases per 100,000 person years
+In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?
+:a. The prevalence increases if the duration of disease is increasing or stays the same.
+:b. The prevalence increases if the duration of disease is decreasing rapidly.
+:c. The prevalence decreases if the duration of disease is increasing.
+:d. The prevalence decreases if the duration of disease stays the same.
+Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.
+<center>
+{| class="wikitable" style="text-align:center:width:25% border="1"
+|-
+|Age groups (years)	||Age-specific rates (per 100,000)||	Michigan standard population ||	Expected number of deaths
+|-
+|<20||	20	||2,000,000||
+|-
+|20-39||	10 ||	3,000,000 ||
+|-
+|40-59	||5	||1,000,000||
+|-
+|>60||	30||	4,000,000||
+|-
+|Total	|| ||	10,000,000 ||
+|}
+</center>
+What is the age-adjusted mortality rate from diabetes among whites according to the table above?
+:a. 40.2 deaths per 100,000
+:b. 19.5 deaths per 100,000
+:c. 1.9 death per 100,000
+:d. 20.4 deaths per 100,000
+Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?
+:a. 1.54
+:b. 5.02
+:c. 1.69
+:d. 0.65
+When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.
+:a. True
+:b. False
+Sequential testing tends to have higher net specificity than specificity of a single test.
+:a. True
+:b. False
+A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:
+<center>
+{| class="wikitable" style="text-align:center:width:25% border="1"
+|-
+|colspan=2 rowspan=2| || colspan=2|Gold standard
+|-
+|Condition Positive||Condition negative
+|-
+|rowspan=2| Result of New Test||	Test Positive ||80||70
+|-
+|Test Negative	||10	||240
+|-
+|}
+</center>
+What is the sensitivity of the new test?
+:a. 77%
+:b. 89%
+:c. 80%
+:d. 53%
+What is the specificity of the test?
+:a. 77%
+:b. 89%
+:c. 80%
+:d. 53%
+What is the positive value of the test?
+:a. 77%
+:b. 89%
+:c. 80%
+:d. 53%
+Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.
+:a. What type of study is this?
+:b. Why is this type of study adequate for this particular situation?
+:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?
+:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?
+:e. Calculate and interpret the above measure of association using a 2X2 table.
+:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.
+Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.
+:a. What type of study design was used in this example?
+:b. Why is this type of study appropriate for this particular situation?
+:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study?
+:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.
+:e. What is the appropriate measure of association for this study? Explain why.
+:f. Calculate and interpret your measure of association.
+A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.
+:a. What type of study is this?
+:b. What type of measure of association is appropriate for this study? Why?
+:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?
+Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?
+:a. Interviewer bias
+:b. Recall bias
+:c. Loss to follow-up
+:d. Non-differential misclassification
+Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:
+:a. Interviewer bias
+:b. Loss to follow-up
+:c. Differential misclassfication
+:d. Non-differential misclassification
+===References===
+*[http://en.wikipedia.org/wiki/Epidemiology  Epidemiology Wikipedia]

Difference between revisions of "SMHS IntroEpi"

Latest revision as of 08:11, 27 April 2015

Contents

Scientific Methods for Health Sciences - Introduction to Epidemiology

Overview

Motivation

Theory

Measuring Disease

Measuring Mortality Rates

Additional Measures of Mortality

Direct and Indirect Adjustment of Rates

Age-adjusted mortality rate for each population of interest

Screening

Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests

Factors Influence Predictive Values

Validity

Reliability (repeatability) of tests

How do multiple testing improve screening programs?

Randomized Controlled Trials (RCT)

Steps of a RCT

External and internal validity

Measures of Association and Effect in RCT

Cohort Study

Case Control Study

Interpretation

Cross-Sectional Studies

Strengths and Limitations of Cross-Sectional Studies

Measures of Association in Cross Sectional Studies

Ecologic Studies

Strengths and Disadvantages of Ecologic Studies

Other Risk Estimates

Bias

Applications

Software

Problems

References

Navigation menu

Search