https://wiki.socr.umich.edu/api.php?action=feedcontributions&user=Glenbrau&feedformat=atomSOCR - User contributions [en]2022-12-05T05:45:27ZUser contributionsMediaWiki 1.31.6https://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting&diff=14905SMHS HypothesisTesting2015-04-29T13:54:04Z<p>Glenbrau: /* Motivation */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Hypothesis Testing ==<br />
<br />
===Overview===<br />
Hypothesis testing is a statistical technique for decision-making regarding populations or processes based on experimental data. It quantitatively answers the probability that chance along might be responsible for the observed discrepancies between a theoretical model and the empirical observations. In this class, we are going to introduce the fundamental terminologies we are going to discuss in Hypothesis Testing include null and alternative hypotheses, Type I and Type II errors, sensitivity, specificity and statistical power and we are going to discuss about hypothesis testing of mean, proportion and mean under various assumptions and hope to prepare students with enough background information of Hypothesis testing in real data analysis.<br />
<br />
Important parts included in Hypothesis testing: <br />
* Decision (significance or no significance)<br />
* Parameter of interest<br />
* Variable of interest<br />
* Population under study<br />
* p-value<br />
<br />
===Motivation===<br />
In statistical data analysis, we often encounter the problem of making statistical decisions about populations or processes based on experimental data. Hypothesis testing will be the direct answer to questions like:<br />
<br />
*How well the findings fit the possibility that the chance alone might be responsible for the observed discrepancy between the theoretical model and the empirical observations<br />
<br />
*The likelihood of the observed summary statistics, assuming that the data comes from the distribution specified by the null hypothesis<br />
<br />
*Whether the data follows the distribution stated in the alternative hypothesis<br />
<br />
In fact, one use of hypothesis testing is to decide whether experimental results contain enough information to cast doubt on conventional wisdom. <br />
<br />
Consider an example of testing whether the new production purse from a factory contains radioactive material. The null hypothesis is that there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. We can then calculate how likely it is that the null hypothesis will produce 10 counts per minute. If it is likely - e.g., if the null hypothesis predicts on average 9 counts per minute - we say the purse is compatible with the null hypothesis. On the other hand, if the null hypothesis predicts for example 3 count per minute, then the purse is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.<br />
<br />
===Theory===<br />
<br />
====Fundamentals of Hypothesis testing (statistical significance testing)====<br />
Null and alternative hypothesis: a null hypothesis a thesis set up to be nullified or refuted in order to support an Alternate (research) Hypothesis. The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise. In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis. In the example of testing radioactive purse above, the null hypothesis is there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what commonly called the significance level of the results.<br />
<br />
In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.<br />
<br />
====Type I Error, Type II Error and Power====<br />
*Type I error: the false positive (Type I) error of rejecting the null hypothesis given that it is actually true; e.g., the purses are detected to containing the radioactive material while they actually do not.<br />
*Type II error: the false negative (Type II) error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., the purses are detected to not containing the radioactive material while they actually do.<br />
*Statistical power: the probability that the test will reject a false null hypothesis (that it will not make a Type II error). When power increases, the chances of a Type II error decrease. <br />
The table below gives an example of calculating specificity, sensitivity, False positive rate $ α$, False Negative Rate $ β $ and power given the information of TN and FN.<br />
<br />
<center><br />
{| class="wikitable" style="text-align:center;width: 25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| '''Actual Condition'''<br />
|-<br />
| '''Absent ($H_{0}$ is true)''' || '''Present ($H_{1}$ is true)''' <br />
|-<br />
| rowspan=2| '''Test Result'''|| '''Negative(fail to reject $H_{0})$''' || Condition absent + Negative result = True (accurate) Negative ('''TN''', 0.98505) || ''Condition present + Negative result = False (invalid) Negative ('''FN''', 0.00025) '''Type II error''' ($\beta$)<br />
|-<br />
| '''Positive (reject $H_{0})$''' || Condition absent + Positive result = False Positive ('''FP''', 0.00995) '''Type I error''' ($\alpha$) || Condition Present + Positive result = True Positive ('''TP''', 0.00475)<br />
|-<br />
|'''Test Interpretation''' || $Power$= $1-\beta$= $1-0.05$ = $0.95$ ||'''Specificity''':$\frac{TN}{(TN+FP)}=\frac{0.98505}{(0.98505+ 0.00995)}= 0.99$ ||'''Sensitivity''':$\frac {TP} {(TP+FN)} = \frac {0.00475} {(0.00475+ 0.00025)}= 0.95$<br />
|-<br />
|}<br />
</center><br />
<br />
Thus, $Specificity = \frac{TN}{TN + FP}$, $Sensitivity= \frac {TP} {TP+FN}$, $\alpha= \frac {FP}{FP+TN},\beta=\frac {FN} {FN+TP}$, $power=1-\beta.$ Note that $\alpha$ and $\beta$ are the false-positive and false-negative ''rates'' (which are directly proportional to the FP and FN counts), respectefully.<br />
<br />
====Testing a claim about a mean with large sample size====<br />
<br />
Recall the random sample $ {X_1,X_2,…,X_n} $ of the process where the population mean is estimated by the sample average $ \bar{X}_{n}=\frac{1}{n}∑_{i=1}^{n}X_{i}$. For a given small significant level say $ \alpha=0.025 $, the $(1-\alpha)100\% $ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar x\pm z_\frac{\alpha}{2}E$, where the margin of error ''E'' is defined as <br />
<br />
: $$E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\<br />
{{1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}},& \texttt{for-unknown}-\sigma.\end{cases}$$<br />
: and $z_{\frac{\alpha}{2}}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value]] for a [[AP_Statistics_Curriculum_2007_Normal_Std |Standard Normal]] distribution at $\frac{\alpha}{2}.$<br />
<br />
*Hypothesis testing about a mean: large samples<br />
<br />
: $ H_{0}: \mu=\mu_{0}$ (e.g., $\mu_{0}=0)$; one sided $H_{1}:\mu>\mu_{0}$ or $ \mu<\mu_{0} $; two sided $H_{1}:\mu≠\mu_{0} $.<br />
:Test statistics: <br />
::(1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
*Example: consider we are testing if the population mean equal to 20 at $\alpha$=0.05 using a double sided alternative test. $H_{0}$:$\mu=20$ vs.$H_{1}:\mu≠20$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, and 12. Population variance not given.<br />
<br />
a <- c( 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, 12)<br />
summary(a)<br />
Min. 1st Qu. Median Mean 3rd Qu. Max. <br />
4.00 9.00 12.00 14.77 16.75 99.00 <br />
sd(a)<br />
[1] 16.53561<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$<br />
<br />
:: From the sample we have $\bar x=14.77, s=16.54$.<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{14.77-20}{{\frac{1} {\sqrt 30} \sqrt {\displaystyle \sum_{i=1}^{30} \frac{(x_{i}-14.77)^{2}}{30-1}}}}$ $= 1.176$<br />
<br />
: $P(T_{df=29}<T_{0}=-1.733)=0.047$, hence we have p-value $=2*0.047=0.094$, we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
====Comparing the means of two samples====<br />
When comparing the means of 2 samples one need to identify if the samples are paired or independent. In the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair|paired samples case, or single sample case, the paired test should be used]]. When the [[AP_Statistics_Curriculum_2007_NonParam_2MedianIndep|two samples are independent, the independent sample test]] needs to be used.<br />
<br />
====Testing a claim about a mean with small sample size====<br />
<br />
Recall the random sample ${X_1,X_2,…,X_n}$ of the process where the population mean is estimated by the sample average $\bar X_{n}=\frac{1}{n}\sum_{i=1}^{n} X_{i}$. For a given small significant level say $\alpha=0.025$, the $(1-\alpha)100\%$ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar{x}\pm t_{\{df=n-1,\frac{\alpha}{2}\}} \frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}} $, where $E=\frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}}$ is the margin of error and $t_{df=n-1,\frac{\alpha} {2}}$ is the critical value of T distribution of ''df=sample size-1'' at $\frac{\alpha}{2}$.<br />
<br />
*Hypothesis testing about a mean: small samples<br />
<br />
: $H_{0}:\mu=\mu_{0}$(e.g.,$\mu_{0}=0$); one sided $H_{1}:\mu>\mu_{0}$ or $\mu<\mu_{0}$;two sided $H_{1}:\mu≠\mu_{0}$.<br />
<br />
: Test statistics: <br />
:: (1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
: Example: consider we are testing if the population mean equal to 20 at α=0.01 using a one sided alternative test. $H_{0}: \mu=12$ vs.$H_{1}:\mu>12$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, and 12. Population variance is not given.<br />
<br />
: From the sample we have $\bar x=22.1,s=27.164$<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{22.1-12}{{\frac{1} {\sqrt 10} \sqrt {\sum_{i=1}^{10} \frac{(x_{i}-22.1)^{2}}{10-1}}}}$ $= 1.176 $<br />
<br />
: $p-value=P(T_{df=29}>T_{0}=1.176)=0.13488$, hence we can’t reject the null hypothesis at $\alpha=0.01$ level of significance.<br />
<br />
====Testing a claim about a proportion====<br />
<br />
Recall that for large samples, the sample distribution of the sample proportion $ \hat p $ is approximately normal by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When sample size is small, the normal approximation is inadequate. The accommodate this, we modify the sample proportion $\hat p $ slightly and obtain the corrected-sample-proportion $\tilde p$.<br />
<br />
$$\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},$$<br />
where $z_\frac{\alpha}{2}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value of a standard normal distribution]] at $\alpha/2$.<br />
<br />
: The standard error of <math>\hat{p}</math> ( and <math>\tilde{p}</math>) also needs a slight modification<br />
$$SE_{\hat{p}} = \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} = \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.$$<br />
<br />
*Hypothesis testing about a single sample proportion:<br />
: Null Hypothesis: $H_o: p=p_o$ (e.g., $p_o=\frac{1}{2}$), where $p$ is the population proportion of interest.<br />
: Alternative Research Hypotheses:<br />
:: One sided (uni-directional): $H_1: p >p_o$, or $H_1: p<p_o$<br />
:: Double sided: $H_1: p \not= p_o.$<br />
: Test Statistics: $Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}} \sim N(0,1).$<br />
<br />
* Example: consider we are testing the effect of some medicine. 500 patients are randomly recruited with evidence of early this disease and were scheduled to take one pill daily for two years. At the end two years, only 17 patients had the disease. Use $\alpha=0.05$ to formulate a test a research hypothesis that the proportion of patient on this treatment that have the disease within 2 years of treatment is $p_0=0.04$.<br />
: $\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038$ <br />
: $SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085$<br />
<br />
: And the corresponding test statistics is<br />
:: $Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353$<br />
<br />
: The p-value of this test is clearly insignificant and we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
* When [[AP_Statistics_Curriculum_2007_Infer_2Proportions|comparing 2 proportions, you can use a similar protocol to infer of they are distinct]].<br />
<br />
====Testing a claim about variance (or standard deviation)====<br />
*Recall that the sample variance $s^2$ is an unbiased point estimate for the population variance $σ^2$, similarly for the standard deviation. The sample variance is roughly chi-square distributed: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2.$<br />
*Hypothesis testing about variance<br />
:: $H_0: σ^2=σ_0^2$ vs. $H_1:σ^2≠σ_0^2$. Given that the chi-square distribution is not symmetric, there are two critical values \(χ_L^2\) and \(χ_R^2.\)<br />
*Test statistics: $\chi_0^2 =\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2$<br />
*Example: we have a random sample of 30 objects drawn form a normal distribution with sample variance \(s^2=5.\) Test at \(α=0.05\) level of significance if this is consistent with \(H_0: \sigma^2=2\).<br />
:: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} =(29*5)/2=72.5$, $\chi_L^2=16.047$ and $\chi_R^2=45.722$, since we have \(\chi_0^2 > \chi_R^2\), we reject the null hypothesis at 5% level of significance. The image below illustrates this calculation using the [http://socr.umich.edu/html/dist/SOCR_Distributions.html SOCR $\chi_{df=29}^2$ calculator]. Notice the Left and right limits for the central 95% confidence interval, $(\chi_L^2=16.047 : \chi_R^2=45.722)$.<br />
<br />
<center><br />
[[Image:SMHS_HypothesisTesting_Fig1.png|500px]]<br />
</center><br />
<br />
* [[SMHS_NonParamInference#Fligner-Killeen_test:_Variance_Homogeneity_.28Differences_of_Variances_of_Independent_Samples.29| See also the non-parametric Fligner-Killeen test for Variance Homogeneity]]<br />
<br />
===Applications===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_AnalysisActivities_OneT This article] illustrates SOCR analyses example on one sample t-test. It presents the background information for one sample t-test and demonstrate the process of doing one sample t-test in the SOCR one sample t-test applet. <br />
*[http://socr.umich.edu/html/SOCR_ChoiceOfStatisticalTest.html This article] titled Choosing the Right Test presents the procedure to select a statistical test. It starts with getting the right hypotheses and then develops the topic based on the choice and characteristics of data. It offers a broad sense of what type of test to choose based on the hypothesis and the data. The article is also accompanied with several exercise for students to practice on their own.<br />
*[http://wiki.stat.ucla.edu/socr/uploads/3/32/Thomson_SOCR_ECON261_HypothesisDifferenceMeans_VIII.pdf This article] titled The Hypothesis Testing For Difference Of Population Parameters presents a comprehensive introduction to the hypothesis testing of difference of population parameters with the background information as well as the application. It also presents the steps to apply hypothesis testing using SOCR analyses.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Analyses.html| SOCR Analysis]<br />
*[http://www.socr.ucla.edu/htmls/ana/OneSampleTTest_Analysis.html One Sample T Test Analysis]<br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html| SOCR Charts]<br />
<br />
===Problems===<br />
USA Today's AD Track examined the effectiveness of the new ads involving the Pets.com Sock Puppet (which is now extinct). In particular, they conducted a nationwide poll of 428 adults who had seen the Pets.com ads and asked for their opinions. They found that 36% of the respondents said they liked the ads. Suppose you increased the sample size for this poll to 1000, but you had the same sample percentage who like the ads (36%). How would this change the p-value of the hypothesis test you want to conduct?<br />
:(a) No way to tell<br />
:(b) The new p-value would be the same as before<br />
:(c) The new p-value would be smaller than before<br />
:(d) The new p-value would be larger than before<br />
<br />
<br />
If we want to estimate the mean difference in scores on a pre-test and post-test for a sample of students, how should we proceed?<br />
:(a) We should construct a confidence interval or conduct a hypothesis test<br />
:(b) We should collect one sample, two samples, or conduct a paired data procedure<br />
:(c) We should calculate a z or a t statistic<br />
<br />
<br />
The paint used to make lines on roads must reflect enough light to be clearly visible at night. Let mu denote the true average reflectometer reading for a new type of paint under consideration. A test of the null hypothesis that mu = 20 versus the alternative hypothesis that mu > 20 will be based on a random sample of size n from a normal population distribution. In which of the following scenarios is there significant evidence that mu is larger than 20?<br />
:(i) n=15, t=3.2, alpha=0.05<br />
:(ii) n=9, t=1.8, alpha=0.01<br />
:(iii) n=24, t=-0.2, alpha=0.01<br />
<br />
:(a) (ii) and (iii)<br />
:(b) (i)<br />
:(c) (iii)<br />
:(d) (ii)<br />
<br />
<br />
We observe the math self-esteem scores from a random sample of 25 female students. How should we determine the probable values of the population mean score for this group?<br />
:(a) Test the difference in means between two paired or dependent samples.<br />
:(b) Test that a correlation coefficient is not equal to 0 (correlation analysis).<br />
:(c) Test the difference between two means (independent samples).<br />
:(d) Test for a difference in more than two means (one way ANOVA).<br />
:(e) Construct a confidence interval.<br />
:(f) Test one mean against a hypothesized constant.<br />
:(g) Use a chi-squared test of association.<br />
<br />
<br />
Food inspectors inspect samples of food products to see if they are safe. This can be thought of as a hypothesis test where H0: the food is safe, and H1: the food is not A. If you are a consumer, which type of error would be the worst one for the inspector to make, the type I or type II error?<br />
:(a) Type I<br />
:(b) Type II<br />
<br />
<br />
A college admissions officer is concerned that their admission criteria might not treat men and women with equal weight. To test this, the college took a random sample of male and female high school seniors from a very large local school district and determined the percent of males and females who would be eligible for admission at the college. Which of the following is a suitable null hypothesis for this test?<br />
:(a) p = 0.5<br />
:(b) The proportion of all eligible men in the district will not equal the proportion of all eligible women in the district.<br />
:(c) The proportion of all eligible men in the school district should be equal to the proportion of all eligible women in the school district.<br />
:(d) The proportion of eligible men sampled should equal the proportion of eligible women sampled.<br />
<br />
<br />
We want to determine if college GPAs differ for male athletes in major sports (e.g., football), minor sports (e.g., swimming), and intramural sports. What statistical method is most likely to be used to answer this question? Assume that all necessary assumptions have been met for using this procedure.<br />
:(a) Test one mean against a hypothesized constant<br />
:(b) Test the difference in means between two paired or dependent samples<br />
:(c) test for a difference in more than two means (one way ANOVA)<br />
:(d) Test that a correlation coefficient is not equal to 0, correlation analysis<br />
:(e) Test the difference between two means (independent samples)<br />
<br />
<br />
Statistics show that the average level of a mother's education for a city of 300,000 people is 14 years with a standard deviation of 1.5 years. A major state university is located in this town. The administrators in this university think that the average level of a mother's education for the freshmen who are admitted to this school is higher than 14 years. The average education level of mothers for a random sample of 100 freshmen who were admitted to this university within the last two years was 14.7 years.<br />
We want to test the null at the level of alpha = 0.001. What is the best answer?<br />
:(a) We reject the alternative and believe that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(b) We reject the null at 0.001 and conclude that the average level of a mother's education is higher for university freshmen.<br />
:(c) We fail to reject the null and conclude that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(d) In order to be certain about the conclusion we reach, a larger sample size is needed to increase the power of the test and the margin of error.<br />
<br />
<br />
The average length of time required to complete a certain aptitude test is claimed to be 80 minutes. A random sample of 25 students yielded an average of 86.5 minutes and a standard deviation of 15.4 minutes. If we assume normality of the population distribution, is there evidence to reject the claim? Choose at least one answer.<br />
:(a) Yes, because the observed 86.5 did not happen by chance<br />
:(b) Yes, because the t-test statistic is 2.11<br />
:(c) Yes, because the observed 86.5 happened by chance<br />
:(d) No, because the probability that the null is true is > 0.05<br />
<br />
<br />
Based on past experience, a bank believes that 4% of the people who receive loans will not make payments on time. The bank has recently approved 300 loans. What is the probability that over 6% of these clients will not make timely payments?<br />
:(a) 0.096<br />
:(b) 0.038<br />
:(c) 0.962<br />
:(d) 0.904<br />
<br />
<br />
Many people sleep in on the weekends to make up for short nights during the work week. The Better Sleep Council reports that 61% of us get more than 7 hours of sleep per night on the weekend. A random sample of 350 adults found that 235 had more than seven hours each night last weekend. At the 0.05 level of significance, does this evidence show that more than 61% of us get seven or more hours off sleep per night on the weekend?<br />
:(a) That cannot be determined without more information<br />
:(b) No<br />
:(c) Yes<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Chapter_VIII:_Hypothesis_Testing SOCR]<br />
*[http://en.wikipedia.org/wiki/Statistical_hypothesis_testing Statistical Hypothesis Testing Wikipedia]<br />
*[http://stattrek.com/hypothesis-test/power-of-test.aspx?tutorial=ap Stat Trek Tutorials]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting&diff=14902SMHS HypothesisTesting2015-04-27T17:49:58Z<p>Glenbrau: /* Overview */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Hypothesis Testing ==<br />
<br />
===Overview===<br />
Hypothesis testing is a statistical technique for decision-making regarding populations or processes based on experimental data. It quantitatively answers the probability that chance along might be responsible for the observed discrepancies between a theoretical model and the empirical observations. In this class, we are going to introduce the fundamental terminologies we are going to discuss in Hypothesis Testing include null and alternative hypotheses, Type I and Type II errors, sensitivity, specificity and statistical power and we are going to discuss about hypothesis testing of mean, proportion and mean under various assumptions and hope to prepare students with enough background information of Hypothesis testing in real data analysis.<br />
<br />
Important parts included in Hypothesis testing: <br />
* Decision (significance or no significance)<br />
* Parameter of interest<br />
* Variable of interest<br />
* Population under study<br />
* p-value<br />
<br />
===Motivation===<br />
In statistical data analysis, we are often encountered with the problem of making statistical decisions about populations or processes based on experimental data. And hypothesis testing will be the direct answer to questions like how well the findings fit he possibility that the chance along might be responsible for the observed discrepancy between theoretical model and empirical observations or what is the likelihood of the observed summary statistics if the data did come from the distribution specified by the null hypothesis? And what if it follows the distribution stated in the alternative hypothesis? In fact, one use of hypothesis testing is to decide whether experimental results contain enough information to cast doubt on conventional wisdom. <br />
Consider an example of testing whether the new production purse from a factory contains radioactive material. The null hypothesis is that there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. We can then calculate how likely it is that the null hypothesis produce 10 count per minute. If it is likely, for example if the null hypothesis predicts on average 9 counts per minute, we say the purse is compatible with the null hypothesis, on the other hand, if the null hypothesis predicts for example 3 count per minute, then the purse is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.<br />
<br />
===Theory===<br />
<br />
====Fundamentals of Hypothesis testing (statistical significance testing)====<br />
Null and alternative hypothesis: a null hypothesis a thesis set up to be nullified or refuted in order to support an Alternate (research) Hypothesis. The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise. In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis. In the example of testing radioactive purse above, the null hypothesis is there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what commonly called the significance level of the results.<br />
<br />
In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.<br />
<br />
====Type I Error, Type II Error and Power====<br />
*Type I error: the false positive (Type I) error of rejecting the null hypothesis given that it is actually true; e.g., the purses are detected to containing the radioactive material while they actually do not.<br />
*Type II error: the false negative (Type II) error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., the purses are detected to not containing the radioactive material while they actually do.<br />
*Statistical power: the probability that the test will reject a false null hypothesis (that it will not make a Type II error). When power increases, the chances of a Type II error decrease. <br />
The table below gives an example of calculating specificity, sensitivity, False positive rate $ α$, False Negative Rate $ β $ and power given the information of TN and FN.<br />
<br />
<center><br />
{| class="wikitable" style="text-align:center;width: 25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| '''Actual Condition'''<br />
|-<br />
| '''Absent ($H_{0}$ is true)''' || '''Present ($H_{1}$ is true)''' <br />
|-<br />
| rowspan=2| '''Test Result'''|| '''Negative(fail to reject $H_{0})$''' || Condition absent + Negative result = True (accurate) Negative ('''TN''', 0.98505) || ''Condition present + Negative result = False (invalid) Negative ('''FN''', 0.00025) '''Type II error''' ($\beta$)<br />
|-<br />
| '''Positive (reject $H_{0})$''' || Condition absent + Positive result = False Positive ('''FP''', 0.00995) '''Type I error''' ($\alpha$) || Condition Present + Positive result = True Positive ('''TP''', 0.00475)<br />
|-<br />
|'''Test Interpretation''' || $Power$= $1-\beta$= $1-0.05$ = $0.95$ ||'''Specificity''':$\frac{TN}{(TN+FP)}=\frac{0.98505}{(0.98505+ 0.00995)}= 0.99$ ||'''Sensitivity''':$\frac {TP} {(TP+FN)} = \frac {0.00475} {(0.00475+ 0.00025)}= 0.95$<br />
|-<br />
|}<br />
</center><br />
<br />
Thus, $Specificity = \frac{TN}{TN + FP}$, $Sensitivity= \frac {TP} {TP+FN}$, $\alpha= \frac {FP}{FP+TN},\beta=\frac {FN} {FN+TP}$, $power=1-\beta.$ Note that $\alpha$ and $\beta$ are the false-positive and false-negative ''rates'' (which are directly proportional to the FP and FN counts), respectefully.<br />
<br />
====Testing a claim about a mean with large sample size====<br />
<br />
Recall the random sample $ {X_1,X_2,…,X_n} $ of the process where the population mean is estimated by the sample average $ \bar{X}_{n}=\frac{1}{n}∑_{i=1}^{n}X_{i}$. For a given small significant level say $ \alpha=0.025 $, the $(1-\alpha)100\% $ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar x\pm z_\frac{\alpha}{2}E$, where the margin of error ''E'' is defined as <br />
<br />
: $$E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\<br />
{{1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}},& \texttt{for-unknown}-\sigma.\end{cases}$$<br />
: and $z_{\frac{\alpha}{2}}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value]] for a [[AP_Statistics_Curriculum_2007_Normal_Std |Standard Normal]] distribution at $\frac{\alpha}{2}.$<br />
<br />
*Hypothesis testing about a mean: large samples<br />
<br />
: $ H_{0}: \mu=\mu_{0}$ (e.g., $\mu_{0}=0)$; one sided $H_{1}:\mu>\mu_{0}$ or $ \mu<\mu_{0} $; two sided $H_{1}:\mu≠\mu_{0} $.<br />
:Test statistics: <br />
::(1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
*Example: consider we are testing if the population mean equal to 20 at $\alpha$=0.05 using a double sided alternative test. $H_{0}$:$\mu=20$ vs.$H_{1}:\mu≠20$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, and 12. Population variance not given.<br />
<br />
a <- c( 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, 12)<br />
summary(a)<br />
Min. 1st Qu. Median Mean 3rd Qu. Max. <br />
4.00 9.00 12.00 14.77 16.75 99.00 <br />
sd(a)<br />
[1] 16.53561<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$<br />
<br />
:: From the sample we have $\bar x=14.77, s=16.54$.<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{14.77-20}{{\frac{1} {\sqrt 30} \sqrt {\displaystyle \sum_{i=1}^{30} \frac{(x_{i}-14.77)^{2}}{30-1}}}}$ $= 1.176$<br />
<br />
: $P(T_{df=29}<T_{0}=-1.733)=0.047$, hence we have p-value $=2*0.047=0.094$, we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
====Comparing the means of two samples====<br />
When comparing the means of 2 samples one need to identify if the samples are paired or independent. In the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair|paired samples case, or single sample case, the paired test should be used]]. When the [[AP_Statistics_Curriculum_2007_NonParam_2MedianIndep|two samples are independent, the independent sample test]] needs to be used.<br />
<br />
====Testing a claim about a mean with small sample size====<br />
<br />
Recall the random sample ${X_1,X_2,…,X_n}$ of the process where the population mean is estimated by the sample average $\bar X_{n}=\frac{1}{n}\sum_{i=1}^{n} X_{i}$. For a given small significant level say $\alpha=0.025$, the $(1-\alpha)100\%$ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar{x}\pm t_{\{df=n-1,\frac{\alpha}{2}\}} \frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}} $, where $E=\frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}}$ is the margin of error and $t_{df=n-1,\frac{\alpha} {2}}$ is the critical value of T distribution of ''df=sample size-1'' at $\frac{\alpha}{2}$.<br />
<br />
*Hypothesis testing about a mean: small samples<br />
<br />
: $H_{0}:\mu=\mu_{0}$(e.g.,$\mu_{0}=0$); one sided $H_{1}:\mu>\mu_{0}$ or $\mu<\mu_{0}$;two sided $H_{1}:\mu≠\mu_{0}$.<br />
<br />
: Test statistics: <br />
:: (1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
: Example: consider we are testing if the population mean equal to 20 at α=0.01 using a one sided alternative test. $H_{0}: \mu=12$ vs.$H_{1}:\mu>12$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, and 12. Population variance is not given.<br />
<br />
: From the sample we have $\bar x=22.1,s=27.164$<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{22.1-12}{{\frac{1} {\sqrt 10} \sqrt {\sum_{i=1}^{10} \frac{(x_{i}-22.1)^{2}}{10-1}}}}$ $= 1.176 $<br />
<br />
: $p-value=P(T_{df=29}>T_{0}=1.176)=0.13488$, hence we can’t reject the null hypothesis at $\alpha=0.01$ level of significance.<br />
<br />
====Testing a claim about a proportion====<br />
<br />
Recall that for large samples, the sample distribution of the sample proportion $ \hat p $ is approximately normal by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When sample size is small, the normal approximation is inadequate. The accommodate this, we modify the sample proportion $\hat p $ slightly and obtain the corrected-sample-proportion $\tilde p$.<br />
<br />
$$\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},$$<br />
where $z_\frac{\alpha}{2}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value of a standard normal distribution]] at $\alpha/2$.<br />
<br />
: The standard error of <math>\hat{p}</math> ( and <math>\tilde{p}</math>) also needs a slight modification<br />
$$SE_{\hat{p}} = \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} = \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.$$<br />
<br />
*Hypothesis testing about a single sample proportion:<br />
: Null Hypothesis: $H_o: p=p_o$ (e.g., $p_o=\frac{1}{2}$), where $p$ is the population proportion of interest.<br />
: Alternative Research Hypotheses:<br />
:: One sided (uni-directional): $H_1: p >p_o$, or $H_1: p<p_o$<br />
:: Double sided: $H_1: p \not= p_o.$<br />
: Test Statistics: $Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}} \sim N(0,1).$<br />
<br />
* Example: consider we are testing the effect of some medicine. 500 patients are randomly recruited with evidence of early this disease and were scheduled to take one pill daily for two years. At the end two years, only 17 patients had the disease. Use $\alpha=0.05$ to formulate a test a research hypothesis that the proportion of patient on this treatment that have the disease within 2 years of treatment is $p_0=0.04$.<br />
: $\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038$ <br />
: $SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085$<br />
<br />
: And the corresponding test statistics is<br />
:: $Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353$<br />
<br />
: The p-value of this test is clearly insignificant and we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
* When [[AP_Statistics_Curriculum_2007_Infer_2Proportions|comparing 2 proportions, you can use a similar protocol to infer of they are distinct]].<br />
<br />
====Testing a claim about variance (or standard deviation)====<br />
*Recall that the sample variance $s^2$ is an unbiased point estimate for the population variance $σ^2$, similarly for the standard deviation. The sample variance is roughly chi-square distributed: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2.$<br />
*Hypothesis testing about variance<br />
:: $H_0: σ^2=σ_0^2$ vs. $H_1:σ^2≠σ_0^2$. Given that the chi-square distribution is not symmetric, there are two critical values \(χ_L^2\) and \(χ_R^2.\)<br />
*Test statistics: $\chi_0^2 =\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2$<br />
*Example: we have a random sample of 30 objects drawn form a normal distribution with sample variance \(s^2=5.\) Test at \(α=0.05\) level of significance if this is consistent with \(H_0: \sigma^2=2\).<br />
:: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} =(29*5)/2=72.5$, $\chi_L^2=16.047$ and $\chi_R^2=45.722$, since we have \(\chi_0^2 > \chi_R^2\), we reject the null hypothesis at 5% level of significance. The image below illustrates this calculation using the [http://socr.umich.edu/html/dist/SOCR_Distributions.html SOCR $\chi_{df=29}^2$ calculator]. Notice the Left and right limits for the central 95% confidence interval, $(\chi_L^2=16.047 : \chi_R^2=45.722)$.<br />
<br />
<center><br />
[[Image:SMHS_HypothesisTesting_Fig1.png|500px]]<br />
</center><br />
<br />
* [[SMHS_NonParamInference#Fligner-Killeen_test:_Variance_Homogeneity_.28Differences_of_Variances_of_Independent_Samples.29| See also the non-parametric Fligner-Killeen test for Variance Homogeneity]]<br />
<br />
===Applications===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_AnalysisActivities_OneT This article] illustrates SOCR analyses example on one sample t-test. It presents the background information for one sample t-test and demonstrate the process of doing one sample t-test in the SOCR one sample t-test applet. <br />
*[http://socr.umich.edu/html/SOCR_ChoiceOfStatisticalTest.html This article] titled Choosing the Right Test presents the procedure to select a statistical test. It starts with getting the right hypotheses and then develops the topic based on the choice and characteristics of data. It offers a broad sense of what type of test to choose based on the hypothesis and the data. The article is also accompanied with several exercise for students to practice on their own.<br />
*[http://wiki.stat.ucla.edu/socr/uploads/3/32/Thomson_SOCR_ECON261_HypothesisDifferenceMeans_VIII.pdf This article] titled The Hypothesis Testing For Difference Of Population Parameters presents a comprehensive introduction to the hypothesis testing of difference of population parameters with the background information as well as the application. It also presents the steps to apply hypothesis testing using SOCR analyses.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Analyses.html| SOCR Analysis]<br />
*[http://www.socr.ucla.edu/htmls/ana/OneSampleTTest_Analysis.html One Sample T Test Analysis]<br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html| SOCR Charts]<br />
<br />
===Problems===<br />
USA Today's AD Track examined the effectiveness of the new ads involving the Pets.com Sock Puppet (which is now extinct). In particular, they conducted a nationwide poll of 428 adults who had seen the Pets.com ads and asked for their opinions. They found that 36% of the respondents said they liked the ads. Suppose you increased the sample size for this poll to 1000, but you had the same sample percentage who like the ads (36%). How would this change the p-value of the hypothesis test you want to conduct?<br />
:(a) No way to tell<br />
:(b) The new p-value would be the same as before<br />
:(c) The new p-value would be smaller than before<br />
:(d) The new p-value would be larger than before<br />
<br />
<br />
If we want to estimate the mean difference in scores on a pre-test and post-test for a sample of students, how should we proceed?<br />
:(a) We should construct a confidence interval or conduct a hypothesis test<br />
:(b) We should collect one sample, two samples, or conduct a paired data procedure<br />
:(c) We should calculate a z or a t statistic<br />
<br />
<br />
The paint used to make lines on roads must reflect enough light to be clearly visible at night. Let mu denote the true average reflectometer reading for a new type of paint under consideration. A test of the null hypothesis that mu = 20 versus the alternative hypothesis that mu > 20 will be based on a random sample of size n from a normal population distribution. In which of the following scenarios is there significant evidence that mu is larger than 20?<br />
:(i) n=15, t=3.2, alpha=0.05<br />
:(ii) n=9, t=1.8, alpha=0.01<br />
:(iii) n=24, t=-0.2, alpha=0.01<br />
<br />
:(a) (ii) and (iii)<br />
:(b) (i)<br />
:(c) (iii)<br />
:(d) (ii)<br />
<br />
<br />
We observe the math self-esteem scores from a random sample of 25 female students. How should we determine the probable values of the population mean score for this group?<br />
:(a) Test the difference in means between two paired or dependent samples.<br />
:(b) Test that a correlation coefficient is not equal to 0 (correlation analysis).<br />
:(c) Test the difference between two means (independent samples).<br />
:(d) Test for a difference in more than two means (one way ANOVA).<br />
:(e) Construct a confidence interval.<br />
:(f) Test one mean against a hypothesized constant.<br />
:(g) Use a chi-squared test of association.<br />
<br />
<br />
Food inspectors inspect samples of food products to see if they are safe. This can be thought of as a hypothesis test where H0: the food is safe, and H1: the food is not A. If you are a consumer, which type of error would be the worst one for the inspector to make, the type I or type II error?<br />
:(a) Type I<br />
:(b) Type II<br />
<br />
<br />
A college admissions officer is concerned that their admission criteria might not treat men and women with equal weight. To test this, the college took a random sample of male and female high school seniors from a very large local school district and determined the percent of males and females who would be eligible for admission at the college. Which of the following is a suitable null hypothesis for this test?<br />
:(a) p = 0.5<br />
:(b) The proportion of all eligible men in the district will not equal the proportion of all eligible women in the district.<br />
:(c) The proportion of all eligible men in the school district should be equal to the proportion of all eligible women in the school district.<br />
:(d) The proportion of eligible men sampled should equal the proportion of eligible women sampled.<br />
<br />
<br />
We want to determine if college GPAs differ for male athletes in major sports (e.g., football), minor sports (e.g., swimming), and intramural sports. What statistical method is most likely to be used to answer this question? Assume that all necessary assumptions have been met for using this procedure.<br />
:(a) Test one mean against a hypothesized constant<br />
:(b) Test the difference in means between two paired or dependent samples<br />
:(c) test for a difference in more than two means (one way ANOVA)<br />
:(d) Test that a correlation coefficient is not equal to 0, correlation analysis<br />
:(e) Test the difference between two means (independent samples)<br />
<br />
<br />
Statistics show that the average level of a mother's education for a city of 300,000 people is 14 years with a standard deviation of 1.5 years. A major state university is located in this town. The administrators in this university think that the average level of a mother's education for the freshmen who are admitted to this school is higher than 14 years. The average education level of mothers for a random sample of 100 freshmen who were admitted to this university within the last two years was 14.7 years.<br />
We want to test the null at the level of alpha = 0.001. What is the best answer?<br />
:(a) We reject the alternative and believe that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(b) We reject the null at 0.001 and conclude that the average level of a mother's education is higher for university freshmen.<br />
:(c) We fail to reject the null and conclude that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(d) In order to be certain about the conclusion we reach, a larger sample size is needed to increase the power of the test and the margin of error.<br />
<br />
<br />
The average length of time required to complete a certain aptitude test is claimed to be 80 minutes. A random sample of 25 students yielded an average of 86.5 minutes and a standard deviation of 15.4 minutes. If we assume normality of the population distribution, is there evidence to reject the claim? Choose at least one answer.<br />
:(a) Yes, because the observed 86.5 did not happen by chance<br />
:(b) Yes, because the t-test statistic is 2.11<br />
:(c) Yes, because the observed 86.5 happened by chance<br />
:(d) No, because the probability that the null is true is > 0.05<br />
<br />
<br />
Based on past experience, a bank believes that 4% of the people who receive loans will not make payments on time. The bank has recently approved 300 loans. What is the probability that over 6% of these clients will not make timely payments?<br />
:(a) 0.096<br />
:(b) 0.038<br />
:(c) 0.962<br />
:(d) 0.904<br />
<br />
<br />
Many people sleep in on the weekends to make up for short nights during the work week. The Better Sleep Council reports that 61% of us get more than 7 hours of sleep per night on the weekend. A random sample of 350 adults found that 235 had more than seven hours each night last weekend. At the 0.05 level of significance, does this evidence show that more than 61% of us get seven or more hours off sleep per night on the weekend?<br />
:(a) That cannot be determined without more information<br />
:(b) No<br />
:(c) Yes<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Chapter_VIII:_Hypothesis_Testing SOCR]<br />
*[http://en.wikipedia.org/wiki/Statistical_hypothesis_testing Statistical Hypothesis Testing Wikipedia]<br />
*[http://stattrek.com/hypothesis-test/power-of-test.aspx?tutorial=ap Stat Trek Tutorials]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting&diff=14901SMHS HypothesisTesting2015-04-27T17:49:38Z<p>Glenbrau: /* Scientific Methods for Health Sciences - Hypothesis Testing */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Hypothesis Testing ==<br />
<br />
===Overview===<br />
Hypothesis testing is a statistical technique for decision-making regarding populations or processes based on experimental data. It quantitatively answers the probability that chance along might be responsible for the observed discrepancies between a theoretical model and the empirical observations. In this class, we are going to introduce the fundamental terminologies we are going to discuss in Hypothesis Testing include null and alternative hypotheses, Type I and Type II errors, sensitivity, specificity and statistical power and we are going to discuss about hypothesis testing of mean, proportion and mean under various assumptions and hope to prepare students with enough background information of Hypothesis testing in real data analysis.<br />
<br />
Important parts included in Hypothesis testing: <br />
* Decision (significance or no significance);<br />
* Parameter of interest; <br />
* Variable of interest;<br />
* Population under study; <br />
* p-value.<br />
<br />
===Motivation===<br />
In statistical data analysis, we are often encountered with the problem of making statistical decisions about populations or processes based on experimental data. And hypothesis testing will be the direct answer to questions like how well the findings fit he possibility that the chance along might be responsible for the observed discrepancy between theoretical model and empirical observations or what is the likelihood of the observed summary statistics if the data did come from the distribution specified by the null hypothesis? And what if it follows the distribution stated in the alternative hypothesis? In fact, one use of hypothesis testing is to decide whether experimental results contain enough information to cast doubt on conventional wisdom. <br />
Consider an example of testing whether the new production purse from a factory contains radioactive material. The null hypothesis is that there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. We can then calculate how likely it is that the null hypothesis produce 10 count per minute. If it is likely, for example if the null hypothesis predicts on average 9 counts per minute, we say the purse is compatible with the null hypothesis, on the other hand, if the null hypothesis predicts for example 3 count per minute, then the purse is not compatible with the null hypothesis and there must be other factors responsible to produce the increased radioactive counts.<br />
<br />
===Theory===<br />
<br />
====Fundamentals of Hypothesis testing (statistical significance testing)====<br />
Null and alternative hypothesis: a null hypothesis a thesis set up to be nullified or refuted in order to support an Alternate (research) Hypothesis. The null hypothesis is presumed true until statistical evidence, in the form of a hypothesis test, indicates otherwise. In science, the null hypothesis is used to test differences between treatment and control groups, and the assumption at the outset of the experiment is that no difference exists between the two groups for the variable of interest (e.g., population means). The null hypothesis proposes something initially presumed true, and it is rejected only when it becomes evidently false. That is, when a researcher has a certain degree of confidence, usually 95% to 99%, that the data do not support the null hypothesis. In the example of testing radioactive purse above, the null hypothesis is there is no radioactive material in the purse and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects in the purse. Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what commonly called the significance level of the results.<br />
<br />
In many scientific experimental designs we predict that a particular factor will produce an effect on our dependent variable — this is our alternative hypothesis. We then consider how often we would expect to observe our experimental results, or results even more extreme, if we were to take many samples from a population where there was no effect (i.e. we test against our null hypothesis). If we find that this happens rarely (up to, say, 5% of the time), we can conclude that our results support our experimental prediction — we reject our null hypothesis.<br />
<br />
====Type I Error, Type II Error and Power====<br />
*Type I error: the false positive (Type I) error of rejecting the null hypothesis given that it is actually true; e.g., the purses are detected to containing the radioactive material while they actually do not.<br />
*Type II error: the false negative (Type II) error of failing to reject the null hypothesis given that the alternative hypothesis is actually true; e.g., the purses are detected to not containing the radioactive material while they actually do.<br />
*Statistical power: the probability that the test will reject a false null hypothesis (that it will not make a Type II error). When power increases, the chances of a Type II error decrease. <br />
The table below gives an example of calculating specificity, sensitivity, False positive rate $ α$, False Negative Rate $ β $ and power given the information of TN and FN.<br />
<br />
<center><br />
{| class="wikitable" style="text-align:center;width: 25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| '''Actual Condition'''<br />
|-<br />
| '''Absent ($H_{0}$ is true)''' || '''Present ($H_{1}$ is true)''' <br />
|-<br />
| rowspan=2| '''Test Result'''|| '''Negative(fail to reject $H_{0})$''' || Condition absent + Negative result = True (accurate) Negative ('''TN''', 0.98505) || ''Condition present + Negative result = False (invalid) Negative ('''FN''', 0.00025) '''Type II error''' ($\beta$)<br />
|-<br />
| '''Positive (reject $H_{0})$''' || Condition absent + Positive result = False Positive ('''FP''', 0.00995) '''Type I error''' ($\alpha$) || Condition Present + Positive result = True Positive ('''TP''', 0.00475)<br />
|-<br />
|'''Test Interpretation''' || $Power$= $1-\beta$= $1-0.05$ = $0.95$ ||'''Specificity''':$\frac{TN}{(TN+FP)}=\frac{0.98505}{(0.98505+ 0.00995)}= 0.99$ ||'''Sensitivity''':$\frac {TP} {(TP+FN)} = \frac {0.00475} {(0.00475+ 0.00025)}= 0.95$<br />
|-<br />
|}<br />
</center><br />
<br />
Thus, $Specificity = \frac{TN}{TN + FP}$, $Sensitivity= \frac {TP} {TP+FN}$, $\alpha= \frac {FP}{FP+TN},\beta=\frac {FN} {FN+TP}$, $power=1-\beta.$ Note that $\alpha$ and $\beta$ are the false-positive and false-negative ''rates'' (which are directly proportional to the FP and FN counts), respectefully.<br />
<br />
====Testing a claim about a mean with large sample size====<br />
<br />
Recall the random sample $ {X_1,X_2,…,X_n} $ of the process where the population mean is estimated by the sample average $ \bar{X}_{n}=\frac{1}{n}∑_{i=1}^{n}X_{i}$. For a given small significant level say $ \alpha=0.025 $, the $(1-\alpha)100\% $ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar x\pm z_\frac{\alpha}{2}E$, where the margin of error ''E'' is defined as <br />
<br />
: $$E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\<br />
{{1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}},& \texttt{for-unknown}-\sigma.\end{cases}$$<br />
: and $z_{\frac{\alpha}{2}}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value]] for a [[AP_Statistics_Curriculum_2007_Normal_Std |Standard Normal]] distribution at $\frac{\alpha}{2}.$<br />
<br />
*Hypothesis testing about a mean: large samples<br />
<br />
: $ H_{0}: \mu=\mu_{0}$ (e.g., $\mu_{0}=0)$; one sided $H_{1}:\mu>\mu_{0}$ or $ \mu<\mu_{0} $; two sided $H_{1}:\mu≠\mu_{0} $.<br />
:Test statistics: <br />
::(1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
*Example: consider we are testing if the population mean equal to 20 at $\alpha$=0.05 using a double sided alternative test. $H_{0}$:$\mu=20$ vs.$H_{1}:\mu≠20$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, and 12. Population variance not given.<br />
<br />
a <- c( 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, 12)<br />
summary(a)<br />
Min. 1st Qu. Median Mean 3rd Qu. Max. <br />
4.00 9.00 12.00 14.77 16.75 99.00 <br />
sd(a)<br />
[1] 16.53561<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$<br />
<br />
:: From the sample we have $\bar x=14.77, s=16.54$.<br />
<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\displaystyle \sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{14.77-20}{{\frac{1} {\sqrt 30} \sqrt {\displaystyle \sum_{i=1}^{30} \frac{(x_{i}-14.77)^{2}}{30-1}}}}$ $= 1.176$<br />
<br />
: $P(T_{df=29}<T_{0}=-1.733)=0.047$, hence we have p-value $=2*0.047=0.094$, we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
====Comparing the means of two samples====<br />
When comparing the means of 2 samples one need to identify if the samples are paired or independent. In the [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair|paired samples case, or single sample case, the paired test should be used]]. When the [[AP_Statistics_Curriculum_2007_NonParam_2MedianIndep|two samples are independent, the independent sample test]] needs to be used.<br />
<br />
====Testing a claim about a mean with small sample size====<br />
<br />
Recall the random sample ${X_1,X_2,…,X_n}$ of the process where the population mean is estimated by the sample average $\bar X_{n}=\frac{1}{n}\sum_{i=1}^{n} X_{i}$. For a given small significant level say $\alpha=0.025$, the $(1-\alpha)100\%$ confidence interval for the mean is constructed by $ CI(\alpha)$: $\bar{x}\pm t_{\{df=n-1,\frac{\alpha}{2}\}} \frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}} $, where $E=\frac{1}{\sqrt {n}} \sqrt{\sum_{i=1}^{n} {\frac{(x_{i}-\bar x)^{2}}{n-1}}}$ is the margin of error and $t_{df=n-1,\frac{\alpha} {2}}$ is the critical value of T distribution of ''df=sample size-1'' at $\frac{\alpha}{2}$.<br />
<br />
*Hypothesis testing about a mean: small samples<br />
<br />
: $H_{0}:\mu=\mu_{0}$(e.g.,$\mu_{0}=0$); one sided $H_{1}:\mu>\mu_{0}$ or $\mu<\mu_{0}$;two sided $H_{1}:\mu≠\mu_{0}$.<br />
<br />
: Test statistics: <br />
:: (1) with known variance: $Z_0=\frac{\bar x -\mu_{0}} {\frac{\sigma}{\sqrt n}}$ $ \thicksim N (0,1)$ <br />
<br />
::(2) with unknown variance $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt {n}} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $ \thicksim T_{df=n-1}$<br />
<br />
: Example: consider we are testing if the population mean equal to 20 at α=0.01 using a one sided alternative test. $H_{0}: \mu=12$ vs.$H_{1}:\mu>12$. The sample data is given: 16, 9, 14, 11, 17, 12, 99, 18, 13, and 12. Population variance is not given.<br />
<br />
: From the sample we have $\bar x=22.1,s=27.164$<br />
: $T_{0}=\frac{\bar x -\mu_{0}} {SE(\bar x)}=\frac{\bar x -\mu_{0}} {{\frac{1} {\sqrt n} \sqrt {\sum_{i=1}^{n} \frac{(x_{i}-\bar x)^{2}}{n-1}}}}$ $\frac{22.1-12}{{\frac{1} {\sqrt 10} \sqrt {\sum_{i=1}^{10} \frac{(x_{i}-22.1)^{2}}{10-1}}}}$ $= 1.176 $<br />
<br />
: $p-value=P(T_{df=29}>T_{0}=1.176)=0.13488$, hence we can’t reject the null hypothesis at $\alpha=0.01$ level of significance.<br />
<br />
====Testing a claim about a proportion====<br />
<br />
Recall that for large samples, the sample distribution of the sample proportion $ \hat p $ is approximately normal by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When sample size is small, the normal approximation is inadequate. The accommodate this, we modify the sample proportion $\hat p $ slightly and obtain the corrected-sample-proportion $\tilde p$.<br />
<br />
$$\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},$$<br />
where $z_\frac{\alpha}{2}$ is the [[AP_Statistics_Curriculum_2007_Normal_Critical | critical value of a standard normal distribution]] at $\alpha/2$.<br />
<br />
: The standard error of <math>\hat{p}</math> ( and <math>\tilde{p}</math>) also needs a slight modification<br />
$$SE_{\hat{p}} = \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} = \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.$$<br />
<br />
*Hypothesis testing about a single sample proportion:<br />
: Null Hypothesis: $H_o: p=p_o$ (e.g., $p_o=\frac{1}{2}$), where $p$ is the population proportion of interest.<br />
: Alternative Research Hypotheses:<br />
:: One sided (uni-directional): $H_1: p >p_o$, or $H_1: p<p_o$<br />
:: Double sided: $H_1: p \not= p_o.$<br />
: Test Statistics: $Z_o={\tilde{p} -p_o \over SE_{\tilde{p}}} \sim N(0,1).$<br />
<br />
* Example: consider we are testing the effect of some medicine. 500 patients are randomly recruited with evidence of early this disease and were scheduled to take one pill daily for two years. At the end two years, only 17 patients had the disease. Use $\alpha=0.05$ to formulate a test a research hypothesis that the proportion of patient on this treatment that have the disease within 2 years of treatment is $p_0=0.04$.<br />
: $\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038$ <br />
: $SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085$<br />
<br />
: And the corresponding test statistics is<br />
:: $Z_o={\tilde{p} - 0.04 \over SE_{\tilde{p}}}={0.002 \over 0.0085}=0.2353$<br />
<br />
: The p-value of this test is clearly insignificant and we can’t reject the null hypothesis at $\alpha=0.05$ level of significance.<br />
<br />
* When [[AP_Statistics_Curriculum_2007_Infer_2Proportions|comparing 2 proportions, you can use a similar protocol to infer of they are distinct]].<br />
<br />
====Testing a claim about variance (or standard deviation)====<br />
*Recall that the sample variance $s^2$ is an unbiased point estimate for the population variance $σ^2$, similarly for the standard deviation. The sample variance is roughly chi-square distributed: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2.$<br />
*Hypothesis testing about variance<br />
:: $H_0: σ^2=σ_0^2$ vs. $H_1:σ^2≠σ_0^2$. Given that the chi-square distribution is not symmetric, there are two critical values \(χ_L^2\) and \(χ_R^2.\)<br />
*Test statistics: $\chi_0^2 =\frac{(n-1) s^2}{\sigma_0^2} \sim \chi_{df=n-1}^2$<br />
*Example: we have a random sample of 30 objects drawn form a normal distribution with sample variance \(s^2=5.\) Test at \(α=0.05\) level of significance if this is consistent with \(H_0: \sigma^2=2\).<br />
:: $\chi_0^2=\frac{(n-1) s^2}{\sigma_0^2} =(29*5)/2=72.5$, $\chi_L^2=16.047$ and $\chi_R^2=45.722$, since we have \(\chi_0^2 > \chi_R^2\), we reject the null hypothesis at 5% level of significance. The image below illustrates this calculation using the [http://socr.umich.edu/html/dist/SOCR_Distributions.html SOCR $\chi_{df=29}^2$ calculator]. Notice the Left and right limits for the central 95% confidence interval, $(\chi_L^2=16.047 : \chi_R^2=45.722)$.<br />
<br />
<center><br />
[[Image:SMHS_HypothesisTesting_Fig1.png|500px]]<br />
</center><br />
<br />
* [[SMHS_NonParamInference#Fligner-Killeen_test:_Variance_Homogeneity_.28Differences_of_Variances_of_Independent_Samples.29| See also the non-parametric Fligner-Killeen test for Variance Homogeneity]]<br />
<br />
===Applications===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_AnalysisActivities_OneT This article] illustrates SOCR analyses example on one sample t-test. It presents the background information for one sample t-test and demonstrate the process of doing one sample t-test in the SOCR one sample t-test applet. <br />
*[http://socr.umich.edu/html/SOCR_ChoiceOfStatisticalTest.html This article] titled Choosing the Right Test presents the procedure to select a statistical test. It starts with getting the right hypotheses and then develops the topic based on the choice and characteristics of data. It offers a broad sense of what type of test to choose based on the hypothesis and the data. The article is also accompanied with several exercise for students to practice on their own.<br />
*[http://wiki.stat.ucla.edu/socr/uploads/3/32/Thomson_SOCR_ECON261_HypothesisDifferenceMeans_VIII.pdf This article] titled The Hypothesis Testing For Difference Of Population Parameters presents a comprehensive introduction to the hypothesis testing of difference of population parameters with the background information as well as the application. It also presents the steps to apply hypothesis testing using SOCR analyses.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Analyses.html| SOCR Analysis]<br />
*[http://www.socr.ucla.edu/htmls/ana/OneSampleTTest_Analysis.html One Sample T Test Analysis]<br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html| SOCR Charts]<br />
<br />
===Problems===<br />
USA Today's AD Track examined the effectiveness of the new ads involving the Pets.com Sock Puppet (which is now extinct). In particular, they conducted a nationwide poll of 428 adults who had seen the Pets.com ads and asked for their opinions. They found that 36% of the respondents said they liked the ads. Suppose you increased the sample size for this poll to 1000, but you had the same sample percentage who like the ads (36%). How would this change the p-value of the hypothesis test you want to conduct?<br />
:(a) No way to tell<br />
:(b) The new p-value would be the same as before<br />
:(c) The new p-value would be smaller than before<br />
:(d) The new p-value would be larger than before<br />
<br />
<br />
If we want to estimate the mean difference in scores on a pre-test and post-test for a sample of students, how should we proceed?<br />
:(a) We should construct a confidence interval or conduct a hypothesis test<br />
:(b) We should collect one sample, two samples, or conduct a paired data procedure<br />
:(c) We should calculate a z or a t statistic<br />
<br />
<br />
The paint used to make lines on roads must reflect enough light to be clearly visible at night. Let mu denote the true average reflectometer reading for a new type of paint under consideration. A test of the null hypothesis that mu = 20 versus the alternative hypothesis that mu > 20 will be based on a random sample of size n from a normal population distribution. In which of the following scenarios is there significant evidence that mu is larger than 20?<br />
:(i) n=15, t=3.2, alpha=0.05<br />
:(ii) n=9, t=1.8, alpha=0.01<br />
:(iii) n=24, t=-0.2, alpha=0.01<br />
<br />
:(a) (ii) and (iii)<br />
:(b) (i)<br />
:(c) (iii)<br />
:(d) (ii)<br />
<br />
<br />
We observe the math self-esteem scores from a random sample of 25 female students. How should we determine the probable values of the population mean score for this group?<br />
:(a) Test the difference in means between two paired or dependent samples.<br />
:(b) Test that a correlation coefficient is not equal to 0 (correlation analysis).<br />
:(c) Test the difference between two means (independent samples).<br />
:(d) Test for a difference in more than two means (one way ANOVA).<br />
:(e) Construct a confidence interval.<br />
:(f) Test one mean against a hypothesized constant.<br />
:(g) Use a chi-squared test of association.<br />
<br />
<br />
Food inspectors inspect samples of food products to see if they are safe. This can be thought of as a hypothesis test where H0: the food is safe, and H1: the food is not A. If you are a consumer, which type of error would be the worst one for the inspector to make, the type I or type II error?<br />
:(a) Type I<br />
:(b) Type II<br />
<br />
<br />
A college admissions officer is concerned that their admission criteria might not treat men and women with equal weight. To test this, the college took a random sample of male and female high school seniors from a very large local school district and determined the percent of males and females who would be eligible for admission at the college. Which of the following is a suitable null hypothesis for this test?<br />
:(a) p = 0.5<br />
:(b) The proportion of all eligible men in the district will not equal the proportion of all eligible women in the district.<br />
:(c) The proportion of all eligible men in the school district should be equal to the proportion of all eligible women in the school district.<br />
:(d) The proportion of eligible men sampled should equal the proportion of eligible women sampled.<br />
<br />
<br />
We want to determine if college GPAs differ for male athletes in major sports (e.g., football), minor sports (e.g., swimming), and intramural sports. What statistical method is most likely to be used to answer this question? Assume that all necessary assumptions have been met for using this procedure.<br />
:(a) Test one mean against a hypothesized constant<br />
:(b) Test the difference in means between two paired or dependent samples<br />
:(c) test for a difference in more than two means (one way ANOVA)<br />
:(d) Test that a correlation coefficient is not equal to 0, correlation analysis<br />
:(e) Test the difference between two means (independent samples)<br />
<br />
<br />
Statistics show that the average level of a mother's education for a city of 300,000 people is 14 years with a standard deviation of 1.5 years. A major state university is located in this town. The administrators in this university think that the average level of a mother's education for the freshmen who are admitted to this school is higher than 14 years. The average education level of mothers for a random sample of 100 freshmen who were admitted to this university within the last two years was 14.7 years.<br />
We want to test the null at the level of alpha = 0.001. What is the best answer?<br />
:(a) We reject the alternative and believe that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(b) We reject the null at 0.001 and conclude that the average level of a mother's education is higher for university freshmen.<br />
:(c) We fail to reject the null and conclude that the level of a mother's education for university freshmen is not higher than the overall population average.<br />
:(d) In order to be certain about the conclusion we reach, a larger sample size is needed to increase the power of the test and the margin of error.<br />
<br />
<br />
The average length of time required to complete a certain aptitude test is claimed to be 80 minutes. A random sample of 25 students yielded an average of 86.5 minutes and a standard deviation of 15.4 minutes. If we assume normality of the population distribution, is there evidence to reject the claim? Choose at least one answer.<br />
:(a) Yes, because the observed 86.5 did not happen by chance<br />
:(b) Yes, because the t-test statistic is 2.11<br />
:(c) Yes, because the observed 86.5 happened by chance<br />
:(d) No, because the probability that the null is true is > 0.05<br />
<br />
<br />
Based on past experience, a bank believes that 4% of the people who receive loans will not make payments on time. The bank has recently approved 300 loans. What is the probability that over 6% of these clients will not make timely payments?<br />
:(a) 0.096<br />
:(b) 0.038<br />
:(c) 0.962<br />
:(d) 0.904<br />
<br />
<br />
Many people sleep in on the weekends to make up for short nights during the work week. The Better Sleep Council reports that 61% of us get more than 7 hours of sleep per night on the weekend. A random sample of 350 adults found that 235 had more than seven hours each night last weekend. At the 0.05 level of significance, does this evidence show that more than 61% of us get seven or more hours off sleep per night on the weekend?<br />
:(a) That cannot be determined without more information<br />
:(b) No<br />
:(c) Yes<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Chapter_VIII:_Hypothesis_Testing SOCR]<br />
*[http://en.wikipedia.org/wiki/Statistical_hypothesis_testing Statistical Hypothesis Testing Wikipedia]<br />
*[http://stattrek.com/hypothesis-test/power-of-test.aspx?tutorial=ap Stat Trek Tutorials]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_HypothesisTesting}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14900SMHS Estimation2015-04-27T15:43:18Z<p>Glenbrau: /* Problems */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': A single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': An interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contain a ''confidence level'', a ''statistic'' and a ''margin of error''. The statistic and the margin of error define an interval estimate, which represents the precision of the method. Confidence Interval is expressed as sample statistic plus the margin of error.<br />
<br />
The interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* ''Confidence level'': The probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* ''Margin of error'': Range of the values above and below the sample statistic in confidence interval; ''margin of error $=$ critical value $*$ standard deviation of the statistic''<br />
<br />
* ''Critical value'': The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, and that the critical value can be expressed as a $t$ score or as a $z$ score provided that ANY of the following conditions apply:<br />
**The population distribution is normal.<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a $t$ score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical $t$ score $(t^*)$ is the $t$ score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a $t$ score or as a $z$ score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the $t$ score is preferred. Nevertheless, many introductory statistics texts use the $z$ score exclusively. <br />
<br />
* ''Standard error'': an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* ''Degrees of freedom'': the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true?<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true?<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14899SMHS Estimation2015-04-27T15:39:11Z<p>Glenbrau: /* Confidence Intervals (CIs) */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': A single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': An interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contain a ''confidence level'', a ''statistic'' and a ''margin of error''. The statistic and the margin of error define an interval estimate, which represents the precision of the method. Confidence Interval is expressed as sample statistic plus the margin of error.<br />
<br />
The interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* ''Confidence level'': The probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* ''Margin of error'': Range of the values above and below the sample statistic in confidence interval; ''margin of error $=$ critical value $*$ standard deviation of the statistic''<br />
<br />
* ''Critical value'': The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, and that the critical value can be expressed as a $t$ score or as a $z$ score provided that ANY of the following conditions apply:<br />
**The population distribution is normal.<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a $t$ score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical $t$ score $(t^*)$ is the $t$ score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a $t$ score or as a $z$ score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the $t$ score is preferred. Nevertheless, many introductory statistics texts use the $z$ score exclusively. <br />
<br />
* ''Standard error'': an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* ''Degrees of freedom'': the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14898SMHS Estimation2015-04-27T15:37:33Z<p>Glenbrau: /* Theory */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': A single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': An interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contain a ''confidence level'', a ''statistic'' and a ''margin of error''. The statistic and the margin of error define an interval estimate, which represents the precision of the method. Confidence Interval is expressed as sample statistic plus the margin of error.<br />
<br />
The interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* ''Confidence level'': The probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* ''Margin of error'': Range of the values above and below the sample statistic in confidence interval; ''margin of error $=$ critical value $*$ standard deviation of the statistic''<br />
<br />
* ''Critical value'': The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, and that the critical value can be expressed as a $t$ score or as a $z$ score provided that ANY of the following conditions apply:<br />
**The population distribution is normal.<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical $t$ score $(t^*)$ is the $t$ score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a $t$ score or as a $z$ score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the $t$ score is preferred. Nevertheless, many introductory statistics texts use the $z$ score exclusively. <br />
<br />
* ''Standard error'': an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* ''Degrees of freedom'': the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies&diff=14897SMHS ExpObsStudies2015-04-27T15:30:09Z<p>Glenbrau: /* Degree of usefulness and reliability */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Experiments vs. Observational Studies ==<br />
<br />
===Overview===<br />
In an experiment investigators apply treatment to experimental units and then proceed to observe the effect of the treatments on the experimental units. It’s an ordered procedure carried out with the goal of verifying, refuting, or establishing the validity of a hypothesis. This lecture will present a general introduction to the field of experiment and different types of experiment that we may apply in researches later. We are also going to talk about observational study, which draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group outside the control of the investigators. A general comparison between these two types of studies will be presented thereafter.<br />
<br />
===Motivation===<br />
Experimental and observational studies are among the most commonly applied studies in researches. Consider a simple example of experimental study. Suppose we enrolled 200 women aged 30 who aren’t smokers and assign half of them to smoking treatment with one pack per day and the other half to no smoking treatment and kept this status for 10 years. Then the lung capacity of all the women are measured and the data is further analyzed and interpreted. Here we have an experimental study. For the other study, we find 200 women aged 30, of whom 100 are smoke free and the other half having been smoking one pack per day for 10 years and the lung capacity of those women are measured. Then the data is further analyzed and interpreted. This would be an easy example of observational study. The difference can be easily drawn from the comparison between these two studies: the assignment of subjects into a treated group versus a control group is outside the control of the investigators where in an experimental study, each subject is randomly assigned to a treated group or a control group. So, what characteristics would define experimental and observational studies in general and what would be the major difference between these two types of studies?<br />
<br />
===Theory===<br />
====Experimental study====<br />
Experimental study is an empirical method that arbitrates between models or hypothesis and used to test existing theories or new hypotheses in order to support them or disprove them. Controlled experiments provide researches with insight into the causal relationship by demonstrating what outcome occurs when a particular factor is manipulated. Experiments may vary from personal and informal natural comparisons, to highly controlled ones. <br />
=====Types of experimental studies=====<br />
*''Controlled experiments'': Compare the results obtained form experimental samples against control samples, which are practically identical to the experimental sample except for the one whose effect is being tested.<br />
*''Natural experiments'': Rely solely on observations of the variables of the system under study, rather than manipulation of just one or a few variables as occurs in controlled experiments. <br />
*''Field experiments'': Named to draw a contrast with laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. It is often used in social sciences, and especially in economic analyses of education.<br />
<br />
====Observational Study====<br />
Observational Study draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group is outside the control of the investigator.<br />
=====Types of observational studies=====<br />
*''Case-control study'': Originally developed in epidemiology, where two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute.<br />
*''Cross-sectional study'': Involves data collection from a population, or a representative subset, at one specific point in time.<br />
*''Longitudinal study'': Correlational research study that involves repeated observations of the same variables over long periods of time.<br />
*''Cohort study (panel study)'': A particular form of longitudinal study where a group of patients is closely monitored over a span of time.<br />
*''Ecological study'': An observational study in which at least one variable is measured at the group level.<br />
<br />
=====Degree of usefulness and reliability=====<br />
Although observational studies can’t be used as reliable sources to make statements of fact about the ‘safety, efficacy, or effectiveness’ of a practice, they can still be used to:<br />
*Provide information on real world use and practice <br />
*Detect signals about the benefits and risk of use in the general population<br />
*Help formulate hypotheses to be tested in subsequent experiments<br />
*Provide part of the community-level data needed to design more informative pragmatic clinical trials<br />
*Inform clinical practice<br />
<br />
====Bias and compensating methods====<br />
When a randomized experiment cannot be carried out, the alternative line of investigation suffers from the problem that the decision of which subjects receive the treatment is not entirely random and thus is a potential source of bias. A major challenge in conducting observational studies is to draw inferences that are acceptably free from influences by overt biases, as well as to assess the influence of potential hidden biases. An observer of an uncontrolled experiment records potential factors and the data output: the goal is to determine the effects of the factors. Sometimes the recorded factors may not be directly causing the differences in the output. Also, recorded or unrecorded factors may be correlated which may yield incorrect conclusions. Finally, as the number of recorded factors increases, the likelihood increases that at least one of the recorded factors will be highly correlated with the data output simply by chance.<br />
<br />
====Comparisons between experimental and observational studies====<br />
*An observational study is used when it is impractical, cost-prohibitive, or inefficient to fit a physical or social system into a laboratory setting, to completely control confounding factors, or to apply random assignment. It can also be used when confounding factors are either limited or known well enough to analyze the data in light of them. In order for an observational science to be valid, confounding factors must be known and accounted for. <br />
*Fundamentally, however, observational studies are not experiments. In addition, observational studies often involve variables that are difficult to quantify or control. Observational studies are limited because they lack the statistical properties of randomized experiments. In a randomized experiment, the method of randomization specified in the experimental protocol guides the statistical analysis, which is usually specified also by the experimental protocol.<br />
*A particular problem with observational studies involving human subjects is the great difficulty attaining fair comparisons between treatments because such studies are prone to selection bias, and groups receiving different treatments (exposures) may differ greatly according to their covariates. In contrast, the randomization ensures that the experimental groups have mean values that are close, due to the CLT or Markov’s inequality. With inadequate randomization or low sample size, the systematic variation in covariates between the treatment groups (or exposure groups) makes it difficult to separate the effect of the treatment (exposure) from the effects of the other covariates, most of which have not been measured. <br />
*To avoid conditions that render an experiment far less useful, physicians conducting medical trials will quantify and randomize the covariates that can be identified. Researchers attempt to reduce the biases of observational studies with complicated statistical methods such as propensity score matching methods, which require large populations of subjects and extensive information on covariates. Outcomes are also quantified when possible and not based on a subject's or a professional observer's opinion. In this way, the design of an observational study can render the results more objective and therefore, more convincing.<br />
<br />
====Applications====<br />
*[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC534936/ This article] titled Observational Versus Experimental Studies: What’s the Evidence for a Hierarchy presents information that contradicts and discourages such a rigid approach to evaluating the quality of research design. It argued that the popular belief that randomized, controlled trials inherently produce gold standard results, and that all observational studies are inferior, does a disservice to patient care, clinical investigation, and education of health care professionals. It proposed that a more balanced strategy evolves, new claims of methodological authority may be just as problematic as the traditional claims of medical authority that have been criticized by proponents of evidence-based medicine.<br />
<br />
*[http://www.nber.org/papers/w13516 This article] titled Observational Learning: Evidence From A Randomized Natural Field Experiment presents results about the effects of observing others’ choices, called observational learning, on individuals' behavior and subjective well-being in the context of restaurant dining from a randomized natural field experiment. In the paper, they conducted experiment to distinguish observational learning effect from saliency effect (because observing others' choices also makes these choices more salient) and found that depending on specifications, the demand for the top 5 dishes was increased by an average of about 13 to 18 percent when these popularity rankings were revealed to the customers; in contrast, being merely mentioned as some sample dishes did not significantly boost their demand. Plus, consistent with theoretical predictions, some modest evidence that observational learning effect was stronger among infrequent customers. Finally, it argued that customers' subjective dining experiences were improved when presented with the information about the top choices by other consumers, but not when presented with the names of some sample dishes.<br />
<br />
====Problems====<br />
The next five problems are based on the following article: <br />
You want to study if exercise decreases your risk of having a cardiovascular event. One of your friends says that they have access to data of 5,000 US adults (aged 65-85) who were surveyed once in 2010 about their current exercise patterns and past cardiovascular events. You look at their data and find that people with cardiovascular events reported higher levels of exercise than those without cardiovascular events. This is counter to your expectation since you know that other studies have shown that exercise should put you at a lower risk of an event.<br />
<br />
# What could be a plausible explanation(s) for this unexpected finding?<br />
# What kind of study is this? Experimental or Observational?<br />
# What feature of a cohort study design might improve your ability to look at this association?<br />
# Now being impressed by cohort studies, you decide to recruit the current student body of UM SPH to prospectively study exercise as a risk factor for many diseases. One of your friends asks if they can use your data to study exercise as a risk factor for MERRF syndrome, an extremely rare disease of the mitochondria with an estimated prevalence of 9 per 1,000,000. Why or why not might this be a good idea?<br />
# Your real interest is maternal fitness level during pregnancy and cardiovascular disease of the offspring in adulthood. What might be the challenges that you will face in conducting a cohort study to answer this question?<br />
<br />
<br />
The next six problems are based on the following article:<br />
Abstract from [http://www.nejm.org/doi/full/10.1056/NEJM199704173361601 DOI: 10.1056/NEJM199704173361601 (Appel et al. 1997)].<br />
<br />
Background <br />
It is known that obesity, sodium intake, and alcohol consumption factors<br />
influence blood pressure. In this clinical trial, Dietary Approaches to Stop<br />
Hypertension, we assessed the effects of dietary patterns on blood pressure. <br />
<br />
Methods<br />
We enrolled 459 adults with systolic blood pressures of less than 160 mm Hg<br />
and diastolic blood pressures of 80 to 95 mm Hg. For three weeks, the subjects<br />
were fed a control diet that was low in fruits, vegetables, and dairy products,<br />
with a fat content typical of the average diet in the United States. They were <br />
then randomly assigned to receive for eight weeks the control diet, a diet rich<br />
in fruits and vegetables, or a "combination" diet rich in fruits, vegetables, and<br />
low-fat dairy products and with reduced saturated and total fat. Sodium intake<br />
and body weight were maintained at constant levels.<br />
<br />
Results<br />
At base line, the mean (+/-SD) systolic and diastolic blood pressures were <br />
131.3 $\pm$ 10.8 mm Hg and 84.7 $\pm$ 4.7 mm Hg, respectively. The combination<br />
diet reduced systolic and diastolic blood pressure by 5.5 and 3.0 mm Hg more,<br />
respectively, than the control diet (P<0.001 for each); the fruits-and-vegetables<br />
diet reduced systolic blood pressure by 2.8 mm Hg more (P<0.001) and diastolic <br />
blood pressure by 1.1 mm Hg more than the control diet (P=0.07). Among the 133<br />
subjects with hypertension (systolic pressure, > or =140 mm Hg; diastolic<br />
pressure, > or =90 mm Hg; or both), the combination diet reduced systolic and<br />
diastolic blood pressure by 11.4 and 5.5 mm Hg more, respectively, than the<br />
control diet (P<0.001 for each); among the 326 subjects without hypertension,<br />
the corresponding reductions were 3.5 mm Hg (P<0.001) and 2.1 mm Hg (P=0.003). <br />
<br />
Conclusions<br />
A diet rich in fruits, vegetables, and low-fat dairy foods and with reduced<br />
saturated and total fat can substantially lower blood pressure. This diet offers<br />
an additional nutritional approach to preventing and treating hypertension. <br />
<br />
# What type of study is this?<br />
# What was the purpose of the trial?<br />
# Losing weight causes blood pressure to drop. Why do you think the investigators made an effort to keep participants’ weight stable in this trial?<br />
# What was the purpose of randomization in this study (i.e. why is randomization done)?<br />
# How would you evaluate whether randomization worked or not?<br />
# Did randomization work?<br />
<br />
===References===<br />
*[http://www.public.iastate.edu/~dnett/S401/nexpvsobs.pdf Online resource] <br />
*[http://en.wikipedia.org/wiki/Experiment Experiment Wikipedia] <br />
*[http://en.wikipedia.org/wiki/Observational_study Observational Study Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies&diff=14896SMHS ExpObsStudies2015-04-27T15:29:13Z<p>Glenbrau: /* Types of observational studies */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Experiments vs. Observational Studies ==<br />
<br />
===Overview===<br />
In an experiment investigators apply treatment to experimental units and then proceed to observe the effect of the treatments on the experimental units. It’s an ordered procedure carried out with the goal of verifying, refuting, or establishing the validity of a hypothesis. This lecture will present a general introduction to the field of experiment and different types of experiment that we may apply in researches later. We are also going to talk about observational study, which draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group outside the control of the investigators. A general comparison between these two types of studies will be presented thereafter.<br />
<br />
===Motivation===<br />
Experimental and observational studies are among the most commonly applied studies in researches. Consider a simple example of experimental study. Suppose we enrolled 200 women aged 30 who aren’t smokers and assign half of them to smoking treatment with one pack per day and the other half to no smoking treatment and kept this status for 10 years. Then the lung capacity of all the women are measured and the data is further analyzed and interpreted. Here we have an experimental study. For the other study, we find 200 women aged 30, of whom 100 are smoke free and the other half having been smoking one pack per day for 10 years and the lung capacity of those women are measured. Then the data is further analyzed and interpreted. This would be an easy example of observational study. The difference can be easily drawn from the comparison between these two studies: the assignment of subjects into a treated group versus a control group is outside the control of the investigators where in an experimental study, each subject is randomly assigned to a treated group or a control group. So, what characteristics would define experimental and observational studies in general and what would be the major difference between these two types of studies?<br />
<br />
===Theory===<br />
====Experimental study====<br />
Experimental study is an empirical method that arbitrates between models or hypothesis and used to test existing theories or new hypotheses in order to support them or disprove them. Controlled experiments provide researches with insight into the causal relationship by demonstrating what outcome occurs when a particular factor is manipulated. Experiments may vary from personal and informal natural comparisons, to highly controlled ones. <br />
=====Types of experimental studies=====<br />
*''Controlled experiments'': Compare the results obtained form experimental samples against control samples, which are practically identical to the experimental sample except for the one whose effect is being tested.<br />
*''Natural experiments'': Rely solely on observations of the variables of the system under study, rather than manipulation of just one or a few variables as occurs in controlled experiments. <br />
*''Field experiments'': Named to draw a contrast with laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. It is often used in social sciences, and especially in economic analyses of education.<br />
<br />
====Observational Study====<br />
Observational Study draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group is outside the control of the investigator.<br />
=====Types of observational studies=====<br />
*''Case-control study'': Originally developed in epidemiology, where two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute.<br />
*''Cross-sectional study'': Involves data collection from a population, or a representative subset, at one specific point in time.<br />
*''Longitudinal study'': Correlational research study that involves repeated observations of the same variables over long periods of time.<br />
*''Cohort study (panel study)'': A particular form of longitudinal study where a group of patients is closely monitored over a span of time.<br />
*''Ecological study'': An observational study in which at least one variable is measured at the group level.<br />
<br />
=====Degree of usefulness and reliability=====<br />
Although observational studies can’t be used as reliable sources to make statements of fact about the ‘safety, efficacy, or effectiveness’ of a practice, they can still be used to:<br />
*Provide information on real world use and practice; <br />
*Detect signals about the benefits and risk of use in the general population; <br />
*Help formulate hypotheses to be tested in subsequent experiments; <br />
*Provide part of the community-level data needed to design more informative pragmatic clinical trials;<br />
*Inform clinical practice.<br />
<br />
====Bias and compensating methods====<br />
When a randomized experiment cannot be carried out, the alternative line of investigation suffers from the problem that the decision of which subjects receive the treatment is not entirely random and thus is a potential source of bias. A major challenge in conducting observational studies is to draw inferences that are acceptably free from influences by overt biases, as well as to assess the influence of potential hidden biases. An observer of an uncontrolled experiment records potential factors and the data output: the goal is to determine the effects of the factors. Sometimes the recorded factors may not be directly causing the differences in the output. Also, recorded or unrecorded factors may be correlated which may yield incorrect conclusions. Finally, as the number of recorded factors increases, the likelihood increases that at least one of the recorded factors will be highly correlated with the data output simply by chance.<br />
<br />
====Comparisons between experimental and observational studies====<br />
*An observational study is used when it is impractical, cost-prohibitive, or inefficient to fit a physical or social system into a laboratory setting, to completely control confounding factors, or to apply random assignment. It can also be used when confounding factors are either limited or known well enough to analyze the data in light of them. In order for an observational science to be valid, confounding factors must be known and accounted for. <br />
*Fundamentally, however, observational studies are not experiments. In addition, observational studies often involve variables that are difficult to quantify or control. Observational studies are limited because they lack the statistical properties of randomized experiments. In a randomized experiment, the method of randomization specified in the experimental protocol guides the statistical analysis, which is usually specified also by the experimental protocol.<br />
*A particular problem with observational studies involving human subjects is the great difficulty attaining fair comparisons between treatments because such studies are prone to selection bias, and groups receiving different treatments (exposures) may differ greatly according to their covariates. In contrast, the randomization ensures that the experimental groups have mean values that are close, due to the CLT or Markov’s inequality. With inadequate randomization or low sample size, the systematic variation in covariates between the treatment groups (or exposure groups) makes it difficult to separate the effect of the treatment (exposure) from the effects of the other covariates, most of which have not been measured. <br />
*To avoid conditions that render an experiment far less useful, physicians conducting medical trials will quantify and randomize the covariates that can be identified. Researchers attempt to reduce the biases of observational studies with complicated statistical methods such as propensity score matching methods, which require large populations of subjects and extensive information on covariates. Outcomes are also quantified when possible and not based on a subject's or a professional observer's opinion. In this way, the design of an observational study can render the results more objective and therefore, more convincing.<br />
<br />
====Applications====<br />
*[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC534936/ This article] titled Observational Versus Experimental Studies: What’s the Evidence for a Hierarchy presents information that contradicts and discourages such a rigid approach to evaluating the quality of research design. It argued that the popular belief that randomized, controlled trials inherently produce gold standard results, and that all observational studies are inferior, does a disservice to patient care, clinical investigation, and education of health care professionals. It proposed that a more balanced strategy evolves, new claims of methodological authority may be just as problematic as the traditional claims of medical authority that have been criticized by proponents of evidence-based medicine.<br />
<br />
*[http://www.nber.org/papers/w13516 This article] titled Observational Learning: Evidence From A Randomized Natural Field Experiment presents results about the effects of observing others’ choices, called observational learning, on individuals' behavior and subjective well-being in the context of restaurant dining from a randomized natural field experiment. In the paper, they conducted experiment to distinguish observational learning effect from saliency effect (because observing others' choices also makes these choices more salient) and found that depending on specifications, the demand for the top 5 dishes was increased by an average of about 13 to 18 percent when these popularity rankings were revealed to the customers; in contrast, being merely mentioned as some sample dishes did not significantly boost their demand. Plus, consistent with theoretical predictions, some modest evidence that observational learning effect was stronger among infrequent customers. Finally, it argued that customers' subjective dining experiences were improved when presented with the information about the top choices by other consumers, but not when presented with the names of some sample dishes.<br />
<br />
====Problems====<br />
The next five problems are based on the following article: <br />
You want to study if exercise decreases your risk of having a cardiovascular event. One of your friends says that they have access to data of 5,000 US adults (aged 65-85) who were surveyed once in 2010 about their current exercise patterns and past cardiovascular events. You look at their data and find that people with cardiovascular events reported higher levels of exercise than those without cardiovascular events. This is counter to your expectation since you know that other studies have shown that exercise should put you at a lower risk of an event.<br />
<br />
# What could be a plausible explanation(s) for this unexpected finding?<br />
# What kind of study is this? Experimental or Observational?<br />
# What feature of a cohort study design might improve your ability to look at this association?<br />
# Now being impressed by cohort studies, you decide to recruit the current student body of UM SPH to prospectively study exercise as a risk factor for many diseases. One of your friends asks if they can use your data to study exercise as a risk factor for MERRF syndrome, an extremely rare disease of the mitochondria with an estimated prevalence of 9 per 1,000,000. Why or why not might this be a good idea?<br />
# Your real interest is maternal fitness level during pregnancy and cardiovascular disease of the offspring in adulthood. What might be the challenges that you will face in conducting a cohort study to answer this question?<br />
<br />
<br />
The next six problems are based on the following article:<br />
Abstract from [http://www.nejm.org/doi/full/10.1056/NEJM199704173361601 DOI: 10.1056/NEJM199704173361601 (Appel et al. 1997)].<br />
<br />
Background <br />
It is known that obesity, sodium intake, and alcohol consumption factors<br />
influence blood pressure. In this clinical trial, Dietary Approaches to Stop<br />
Hypertension, we assessed the effects of dietary patterns on blood pressure. <br />
<br />
Methods<br />
We enrolled 459 adults with systolic blood pressures of less than 160 mm Hg<br />
and diastolic blood pressures of 80 to 95 mm Hg. For three weeks, the subjects<br />
were fed a control diet that was low in fruits, vegetables, and dairy products,<br />
with a fat content typical of the average diet in the United States. They were <br />
then randomly assigned to receive for eight weeks the control diet, a diet rich<br />
in fruits and vegetables, or a "combination" diet rich in fruits, vegetables, and<br />
low-fat dairy products and with reduced saturated and total fat. Sodium intake<br />
and body weight were maintained at constant levels.<br />
<br />
Results<br />
At base line, the mean (+/-SD) systolic and diastolic blood pressures were <br />
131.3 $\pm$ 10.8 mm Hg and 84.7 $\pm$ 4.7 mm Hg, respectively. The combination<br />
diet reduced systolic and diastolic blood pressure by 5.5 and 3.0 mm Hg more,<br />
respectively, than the control diet (P<0.001 for each); the fruits-and-vegetables<br />
diet reduced systolic blood pressure by 2.8 mm Hg more (P<0.001) and diastolic <br />
blood pressure by 1.1 mm Hg more than the control diet (P=0.07). Among the 133<br />
subjects with hypertension (systolic pressure, > or =140 mm Hg; diastolic<br />
pressure, > or =90 mm Hg; or both), the combination diet reduced systolic and<br />
diastolic blood pressure by 11.4 and 5.5 mm Hg more, respectively, than the<br />
control diet (P<0.001 for each); among the 326 subjects without hypertension,<br />
the corresponding reductions were 3.5 mm Hg (P<0.001) and 2.1 mm Hg (P=0.003). <br />
<br />
Conclusions<br />
A diet rich in fruits, vegetables, and low-fat dairy foods and with reduced<br />
saturated and total fat can substantially lower blood pressure. This diet offers<br />
an additional nutritional approach to preventing and treating hypertension. <br />
<br />
# What type of study is this?<br />
# What was the purpose of the trial?<br />
# Losing weight causes blood pressure to drop. Why do you think the investigators made an effort to keep participants’ weight stable in this trial?<br />
# What was the purpose of randomization in this study (i.e. why is randomization done)?<br />
# How would you evaluate whether randomization worked or not?<br />
# Did randomization work?<br />
<br />
===References===<br />
*[http://www.public.iastate.edu/~dnett/S401/nexpvsobs.pdf Online resource] <br />
*[http://en.wikipedia.org/wiki/Experiment Experiment Wikipedia] <br />
*[http://en.wikipedia.org/wiki/Observational_study Observational Study Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies&diff=14895SMHS ExpObsStudies2015-04-27T15:28:37Z<p>Glenbrau: /* Types of experimental studies */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Experiments vs. Observational Studies ==<br />
<br />
===Overview===<br />
In an experiment investigators apply treatment to experimental units and then proceed to observe the effect of the treatments on the experimental units. It’s an ordered procedure carried out with the goal of verifying, refuting, or establishing the validity of a hypothesis. This lecture will present a general introduction to the field of experiment and different types of experiment that we may apply in researches later. We are also going to talk about observational study, which draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group outside the control of the investigators. A general comparison between these two types of studies will be presented thereafter.<br />
<br />
===Motivation===<br />
Experimental and observational studies are among the most commonly applied studies in researches. Consider a simple example of experimental study. Suppose we enrolled 200 women aged 30 who aren’t smokers and assign half of them to smoking treatment with one pack per day and the other half to no smoking treatment and kept this status for 10 years. Then the lung capacity of all the women are measured and the data is further analyzed and interpreted. Here we have an experimental study. For the other study, we find 200 women aged 30, of whom 100 are smoke free and the other half having been smoking one pack per day for 10 years and the lung capacity of those women are measured. Then the data is further analyzed and interpreted. This would be an easy example of observational study. The difference can be easily drawn from the comparison between these two studies: the assignment of subjects into a treated group versus a control group is outside the control of the investigators where in an experimental study, each subject is randomly assigned to a treated group or a control group. So, what characteristics would define experimental and observational studies in general and what would be the major difference between these two types of studies?<br />
<br />
===Theory===<br />
====Experimental study====<br />
Experimental study is an empirical method that arbitrates between models or hypothesis and used to test existing theories or new hypotheses in order to support them or disprove them. Controlled experiments provide researches with insight into the causal relationship by demonstrating what outcome occurs when a particular factor is manipulated. Experiments may vary from personal and informal natural comparisons, to highly controlled ones. <br />
=====Types of experimental studies=====<br />
*''Controlled experiments'': Compare the results obtained form experimental samples against control samples, which are practically identical to the experimental sample except for the one whose effect is being tested.<br />
*''Natural experiments'': Rely solely on observations of the variables of the system under study, rather than manipulation of just one or a few variables as occurs in controlled experiments. <br />
*''Field experiments'': Named to draw a contrast with laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. It is often used in social sciences, and especially in economic analyses of education.<br />
<br />
====Observational Study====<br />
Observational Study draws inferences about the possible effect of a treatment on subjects where the assignment of subjects into a treated group versus a control group is outside the control of the investigator.<br />
=====Types of observational studies=====<br />
*Case-control study: originally developed in epidemiology, where two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute.<br />
*Cross-sectional study: involves data collection from a population, or a representative subset, at one specific point in time.<br />
*Longitudinal study: correlational research study that involves repeated observations of the same variables over long periods of time.<br />
*Cohort study (panel study): a particular form of longitudinal study where a group of patients is closely monitored over a span of time.<br />
*Ecological study: an observational study in which at least one variable is measured at the group level.<br />
<br />
=====Degree of usefulness and reliability=====<br />
Although observational studies can’t be used as reliable sources to make statements of fact about the ‘safety, efficacy, or effectiveness’ of a practice, they can still be used to:<br />
*Provide information on real world use and practice; <br />
*Detect signals about the benefits and risk of use in the general population; <br />
*Help formulate hypotheses to be tested in subsequent experiments; <br />
*Provide part of the community-level data needed to design more informative pragmatic clinical trials;<br />
*Inform clinical practice.<br />
<br />
====Bias and compensating methods====<br />
When a randomized experiment cannot be carried out, the alternative line of investigation suffers from the problem that the decision of which subjects receive the treatment is not entirely random and thus is a potential source of bias. A major challenge in conducting observational studies is to draw inferences that are acceptably free from influences by overt biases, as well as to assess the influence of potential hidden biases. An observer of an uncontrolled experiment records potential factors and the data output: the goal is to determine the effects of the factors. Sometimes the recorded factors may not be directly causing the differences in the output. Also, recorded or unrecorded factors may be correlated which may yield incorrect conclusions. Finally, as the number of recorded factors increases, the likelihood increases that at least one of the recorded factors will be highly correlated with the data output simply by chance.<br />
<br />
====Comparisons between experimental and observational studies====<br />
*An observational study is used when it is impractical, cost-prohibitive, or inefficient to fit a physical or social system into a laboratory setting, to completely control confounding factors, or to apply random assignment. It can also be used when confounding factors are either limited or known well enough to analyze the data in light of them. In order for an observational science to be valid, confounding factors must be known and accounted for. <br />
*Fundamentally, however, observational studies are not experiments. In addition, observational studies often involve variables that are difficult to quantify or control. Observational studies are limited because they lack the statistical properties of randomized experiments. In a randomized experiment, the method of randomization specified in the experimental protocol guides the statistical analysis, which is usually specified also by the experimental protocol.<br />
*A particular problem with observational studies involving human subjects is the great difficulty attaining fair comparisons between treatments because such studies are prone to selection bias, and groups receiving different treatments (exposures) may differ greatly according to their covariates. In contrast, the randomization ensures that the experimental groups have mean values that are close, due to the CLT or Markov’s inequality. With inadequate randomization or low sample size, the systematic variation in covariates between the treatment groups (or exposure groups) makes it difficult to separate the effect of the treatment (exposure) from the effects of the other covariates, most of which have not been measured. <br />
*To avoid conditions that render an experiment far less useful, physicians conducting medical trials will quantify and randomize the covariates that can be identified. Researchers attempt to reduce the biases of observational studies with complicated statistical methods such as propensity score matching methods, which require large populations of subjects and extensive information on covariates. Outcomes are also quantified when possible and not based on a subject's or a professional observer's opinion. In this way, the design of an observational study can render the results more objective and therefore, more convincing.<br />
<br />
====Applications====<br />
*[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC534936/ This article] titled Observational Versus Experimental Studies: What’s the Evidence for a Hierarchy presents information that contradicts and discourages such a rigid approach to evaluating the quality of research design. It argued that the popular belief that randomized, controlled trials inherently produce gold standard results, and that all observational studies are inferior, does a disservice to patient care, clinical investigation, and education of health care professionals. It proposed that a more balanced strategy evolves, new claims of methodological authority may be just as problematic as the traditional claims of medical authority that have been criticized by proponents of evidence-based medicine.<br />
<br />
*[http://www.nber.org/papers/w13516 This article] titled Observational Learning: Evidence From A Randomized Natural Field Experiment presents results about the effects of observing others’ choices, called observational learning, on individuals' behavior and subjective well-being in the context of restaurant dining from a randomized natural field experiment. In the paper, they conducted experiment to distinguish observational learning effect from saliency effect (because observing others' choices also makes these choices more salient) and found that depending on specifications, the demand for the top 5 dishes was increased by an average of about 13 to 18 percent when these popularity rankings were revealed to the customers; in contrast, being merely mentioned as some sample dishes did not significantly boost their demand. Plus, consistent with theoretical predictions, some modest evidence that observational learning effect was stronger among infrequent customers. Finally, it argued that customers' subjective dining experiences were improved when presented with the information about the top choices by other consumers, but not when presented with the names of some sample dishes.<br />
<br />
====Problems====<br />
The next five problems are based on the following article: <br />
You want to study if exercise decreases your risk of having a cardiovascular event. One of your friends says that they have access to data of 5,000 US adults (aged 65-85) who were surveyed once in 2010 about their current exercise patterns and past cardiovascular events. You look at their data and find that people with cardiovascular events reported higher levels of exercise than those without cardiovascular events. This is counter to your expectation since you know that other studies have shown that exercise should put you at a lower risk of an event.<br />
<br />
# What could be a plausible explanation(s) for this unexpected finding?<br />
# What kind of study is this? Experimental or Observational?<br />
# What feature of a cohort study design might improve your ability to look at this association?<br />
# Now being impressed by cohort studies, you decide to recruit the current student body of UM SPH to prospectively study exercise as a risk factor for many diseases. One of your friends asks if they can use your data to study exercise as a risk factor for MERRF syndrome, an extremely rare disease of the mitochondria with an estimated prevalence of 9 per 1,000,000. Why or why not might this be a good idea?<br />
# Your real interest is maternal fitness level during pregnancy and cardiovascular disease of the offspring in adulthood. What might be the challenges that you will face in conducting a cohort study to answer this question?<br />
<br />
<br />
The next six problems are based on the following article:<br />
Abstract from [http://www.nejm.org/doi/full/10.1056/NEJM199704173361601 DOI: 10.1056/NEJM199704173361601 (Appel et al. 1997)].<br />
<br />
Background <br />
It is known that obesity, sodium intake, and alcohol consumption factors<br />
influence blood pressure. In this clinical trial, Dietary Approaches to Stop<br />
Hypertension, we assessed the effects of dietary patterns on blood pressure. <br />
<br />
Methods<br />
We enrolled 459 adults with systolic blood pressures of less than 160 mm Hg<br />
and diastolic blood pressures of 80 to 95 mm Hg. For three weeks, the subjects<br />
were fed a control diet that was low in fruits, vegetables, and dairy products,<br />
with a fat content typical of the average diet in the United States. They were <br />
then randomly assigned to receive for eight weeks the control diet, a diet rich<br />
in fruits and vegetables, or a "combination" diet rich in fruits, vegetables, and<br />
low-fat dairy products and with reduced saturated and total fat. Sodium intake<br />
and body weight were maintained at constant levels.<br />
<br />
Results<br />
At base line, the mean (+/-SD) systolic and diastolic blood pressures were <br />
131.3 $\pm$ 10.8 mm Hg and 84.7 $\pm$ 4.7 mm Hg, respectively. The combination<br />
diet reduced systolic and diastolic blood pressure by 5.5 and 3.0 mm Hg more,<br />
respectively, than the control diet (P<0.001 for each); the fruits-and-vegetables<br />
diet reduced systolic blood pressure by 2.8 mm Hg more (P<0.001) and diastolic <br />
blood pressure by 1.1 mm Hg more than the control diet (P=0.07). Among the 133<br />
subjects with hypertension (systolic pressure, > or =140 mm Hg; diastolic<br />
pressure, > or =90 mm Hg; or both), the combination diet reduced systolic and<br />
diastolic blood pressure by 11.4 and 5.5 mm Hg more, respectively, than the<br />
control diet (P<0.001 for each); among the 326 subjects without hypertension,<br />
the corresponding reductions were 3.5 mm Hg (P<0.001) and 2.1 mm Hg (P=0.003). <br />
<br />
Conclusions<br />
A diet rich in fruits, vegetables, and low-fat dairy foods and with reduced<br />
saturated and total fat can substantially lower blood pressure. This diet offers<br />
an additional nutritional approach to preventing and treating hypertension. <br />
<br />
# What type of study is this?<br />
# What was the purpose of the trial?<br />
# Losing weight causes blood pressure to drop. Why do you think the investigators made an effort to keep participants’ weight stable in this trial?<br />
# What was the purpose of randomization in this study (i.e. why is randomization done)?<br />
# How would you evaluate whether randomization worked or not?<br />
# Did randomization work?<br />
<br />
===References===<br />
*[http://www.public.iastate.edu/~dnett/S401/nexpvsobs.pdf Online resource] <br />
*[http://en.wikipedia.org/wiki/Experiment Experiment Wikipedia] <br />
*[http://en.wikipedia.org/wiki/Observational_study Observational Study Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_ExpObsStudies}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14894SMHS IntroEpi2015-04-27T13:11:06Z<p>Glenbrau: /* Bias */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information<br />
# Using existing information if/when possible (e.g. medical record)<br />
# Masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn<br />
# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn<br />
# When the disease being studied does not occur frequently<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# Good for generating hypotheses<br />
# Easily sets up other analytic designs<br />
# Temporality is not a problem for time invariant exposures (genetic markers)<br />
# Relatively low cost<br />
<br />
*'''Weakness:'''<br />
# Temporality – exposure or disease which happened first<br />
# Prevalent cases may not be the same as incident cases<br />
# Not useful for rare disease<br />
# Subject to selection bias<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# Data is relatively easy and/or cheap to obtain.<br />
# Ecological studies are a good place to start.<br />
# Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weaknesses:''' <br />
#Reliance on group-level data may not correctly represent individual-level associations. <br />
#Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
#Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*''Attributable Risk Estimates of Effect'': If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*''Attributable Risk'' ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*''Attributable Risk Percent'' $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*''Population Attributable Risk'' ($PAR$): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*''Population Attributable Risk Percent'' $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*''Causes of bias'': Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*''Impact of bias'': Makes it appear as if there is an association when there really is none (bias away form the null); masks an association when there really is one (bias toward the null).<br />
*''Reasons we get wrong answers'': Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*''Information bias'': The quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
*''Confounding bias'': Differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
*''Chance'': The luck of draw gets you a study sample that is not representative of the larger population.<br />
*''Strategies to handle confounding'': (1) In study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
*''Concordant pairs'': Both case and control exposed; neither case nor control exposed.<br />
*''Discordant pairs'': Case exposed but control not exposed; control exposed but case not exposed.<br />
*''Matched analysis'': Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*''Randomization'': Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*''Stratification'': Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*''Adjustment'': A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
An example of age-adjustment:<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14893SMHS IntroEpi2015-04-27T13:06:58Z<p>Glenbrau: /* Other Risk Estimates */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information<br />
# Using existing information if/when possible (e.g. medical record)<br />
# Masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn<br />
# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn<br />
# When the disease being studied does not occur frequently<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# Good for generating hypotheses<br />
# Easily sets up other analytic designs<br />
# Temporality is not a problem for time invariant exposures (genetic markers)<br />
# Relatively low cost<br />
<br />
*'''Weakness:'''<br />
# Temporality – exposure or disease which happened first<br />
# Prevalent cases may not be the same as incident cases<br />
# Not useful for rare disease<br />
# Subject to selection bias<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# Data is relatively easy and/or cheap to obtain.<br />
# Ecological studies are a good place to start.<br />
# Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weaknesses:''' <br />
#Reliance on group-level data may not correctly represent individual-level associations. <br />
#Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
#Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*''Attributable Risk Estimates of Effect'': If exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*''Attributable Risk'' ($AR$): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*''Attributable Risk Percent'' $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*''Population Attributable Risk'' ($PAR$): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*''Population Attributable Risk Percent'' $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14892SMHS IntroEpi2015-04-27T13:04:57Z<p>Glenbrau: /* Ecologic Studies */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information<br />
# Using existing information if/when possible (e.g. medical record)<br />
# Masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn<br />
# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn<br />
# When the disease being studied does not occur frequently<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# Good for generating hypotheses<br />
# Easily sets up other analytic designs<br />
# Temporality is not a problem for time invariant exposures (genetic markers)<br />
# Relatively low cost<br />
<br />
*'''Weakness:'''<br />
# Temporality – exposure or disease which happened first<br />
# Prevalent cases may not be the same as incident cases<br />
# Not useful for rare disease<br />
# Subject to selection bias<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study), by time (time-trend study), or by place & time (mixed study). However, one error that could occur is when an association is identified based on group level (ecological) characteristics that are ascribed to individuals when such associations do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# Data is relatively easy and/or cheap to obtain.<br />
# Ecological studies are a good place to start.<br />
# Many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weaknesses:''' <br />
#Reliance on group-level data may not correctly represent individual-level associations. <br />
#Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
#Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14891SMHS IntroEpi2015-04-27T12:59:49Z<p>Glenbrau: /* Strengths and Limitations of Cross-Sectional Studies */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information<br />
# Using existing information if/when possible (e.g. medical record)<br />
# Masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn<br />
# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn<br />
# When the disease being studied does not occur frequently<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# Good for generating hypotheses<br />
# Easily sets up other analytic designs<br />
# Temporality is not a problem for time invariant exposures (genetic markers)<br />
# Relatively low cost<br />
<br />
*'''Weakness:'''<br />
# Temporality – exposure or disease which happened first<br />
# Prevalent cases may not be the same as incident cases<br />
# Not useful for rare disease<br />
# Subject to selection bias<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14890SMHS IntroEpi2015-04-27T12:58:58Z<p>Glenbrau: /* Interpretation */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# Adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information<br />
# Using existing information if/when possible (e.g. medical record)<br />
# Masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# When the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn<br />
# When the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn<br />
# When the disease being studied does not occur frequently<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14889SMHS IntroEpi2015-04-27T12:55:52Z<p>Glenbrau: /* Cohort Study */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
**Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
**Retrospective has benefits: more cost effective; good for disease of long latency<br />
**Prospective has benefits: data quality presumably higher<br />
<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
**Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
**Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14888SMHS IntroEpi2015-04-27T12:53:21Z<p>Glenbrau: /* External and internal validity */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population, which is influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study, which is influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
: Retrospective has benefits: more cost effective; good for disease of long latency<br />
: Prospective has benefits: data quality presumably higher<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14887SMHS IntroEpi2015-04-27T12:44:48Z<p>Glenbrau: /* Reliability(repeatability) of tests */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability (repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
: Retrospective has benefits: more cost effective; good for disease of long latency<br />
: Prospective has benefits: data quality presumably higher<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14885SMHS IntroEpi2015-04-16T19:36:57Z<p>Glenbrau: /* Case Control Study */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
: Retrospective has benefits: more cost effective; good for disease of long latency<br />
: Prospective has benefits: data quality presumably higher<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14884SMHS IntroEpi2015-04-16T19:36:33Z<p>Glenbrau: /* Cohort Study */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected<br />
: Retrospective has benefits: more cost effective; good for disease of long latency<br />
: Prospective has benefits: data quality presumably higher<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information<br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized<br />
# Like RCT, excellent for studying rare exposures <br />
# Multiple outcomes and sometimes multiple exposures can be studied<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive<br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop<br />
# Loss to follow-up can be a problem<br />
# Changes over time in criteria and methods can lead to problems with inferences<br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies<br />
# When the exposure is rare but incidence of disease among the exposure is high<br />
# When time between exposure and development of the disease is relatively short or historical data is available<br />
# When good follow-up can be ensured<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14881SMHS Estimation2015-04-07T17:58:36Z<p>Glenbrau: /* Confidence Intervals (CIs) */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': a single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': an interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contain a ''confidence level'', a ''statistic'' and a ''margin of error''. The statistic and the margin of error define an interval estimate, which represents the precision of the method. Confidence Interval is expressed as sample statistic plus the margin of error.<br />
<br />
The interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* ''Confidence level'': The probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* ''Margin of error'': Range of the values above and below the sample statistic in confidence interval; ''margin of error $=$ critical value $*$ standard deviation of the statistic''<br />
<br />
* ''Critical value'': The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, and that the critical value can be expressed as a $t$ score or as a $z$ score provided that ANY of the following conditions apply:<br />
**The population distribution is normal.<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical $t$ score $(t^*)$ is the $t$ score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a $t$ score or as a $z$ score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the $t$ score is preferred. Nevertheless, many introductory statistics texts use the $z$ score exclusively. <br />
<br />
* ''Standard error'': an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* ''Degrees of freedom'': the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14880SMHS Estimation2015-04-07T16:06:28Z<p>Glenbrau: /* Confidence Intervals (CIs) */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': a single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': an interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contain a ''confidence level'', a ''statistic'' and a ''margin of error''. The statistic and the margin of error define an interval estimate, which represents the precision of the method. Confidence Interval is expressed as sample statistic plus the margin of error.<br />
<br />
The interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* ''Confidence level'': The probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* ''Margin of error'': Range of the values above and below the sample statistic in confidence interval; ''margin of error $=$ critical value $*$ standard deviation of the statistic''<br />
<br />
* ''Critical value'': The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal, and that the critical value can be expressed as a $t$ score or as a $z$ score provided that ANY of the following conditions apply:<br />
**The population distribution is normal.<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less.<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40.<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical $t$ score $(t^*)$ is the $t$ score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a t score or as a $z$ score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the $t$ score is preferred. Nevertheless, many introductory statistics texts use the $z$ score exclusively. <br />
<br />
* ''Standard error'': an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* ''Degrees of freedom'': the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14879SMHS Estimation2015-04-07T14:06:55Z<p>Glenbrau: /* Theory */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*''Point estimate'': a single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*''Interval estimate'': an interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contains a confidence level, a statistic and a margin of error. The statistic and the margin of error define an interval estimate, which represent the precision of the method. Confidence Interval is expressed as sample statistic ± margin of error.<br />
Interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* Confidence level: the probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* Margin of error: range of the values above and below the sample statistic in confidence interval. ''margin of error=critical value*standard deviation of the statistic''.<br />
<br />
* Critical value: The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal and the critical value can be expressed as a t score or as a z score, if ANY of the following conditions apply:<br />
**The population distribution is normal;<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less;<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40;<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical t score $(t^*)$ is the t score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a t score or as a z score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the t score is preferred. Nevertheless, many introductory statistics texts use the z score exclusively. <br />
<br />
* Standard error: an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* Degrees of freedom: the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14878SMHS Estimation2015-04-07T14:06:27Z<p>Glenbrau: /* Motivation */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There is a variety of methods and criteria to develop and choose between estimators based on their performance: <br />
#Maximum likelihood estimators<br />
#Bayes estimators<br />
#Method of moments estimators<br />
#Minimum mean square error estimators<br />
#Minimum variance unbiased estimator<br />
#Best linear unbiased estimator, etc. <br />
<br />
Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*Point estimate: a single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*Interval estimate: an interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contains a confidence level, a statistic and a margin of error. The statistic and the margin of error define an interval estimate, which represent the precision of the method. Confidence Interval is expressed as sample statistic ± margin of error.<br />
Interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* Confidence level: the probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* Margin of error: range of the values above and below the sample statistic in confidence interval. ''margin of error=critical value*standard deviation of the statistic''.<br />
<br />
* Critical value: The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal and the critical value can be expressed as a t score or as a z score, if ANY of the following conditions apply:<br />
**The population distribution is normal;<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less;<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40;<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical t score $(t^*)$ is the t score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a t score or as a z score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the t score is preferred. Nevertheless, many introductory statistics texts use the z score exclusively. <br />
<br />
* Standard error: an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* Degrees of freedom: the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_Estimation&diff=14877SMHS Estimation2015-04-07T13:55:07Z<p>Glenbrau: /* Overview */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Parameter Estimation ==<br />
<br />
===Overview===<br />
Estimation is an important concept in the field of statistics and application of estimation is widely applied in various areas. It deals with estimating values of parameters of the population based on the sample data. And the parameters describe an underlying physical setting and their value would affect the distribution of the measured data. Two major approaches are commonly used in estimation: <br />
# The probabilistic approach assumes that the measured data is random with probability distribution dependent on the parameters. <br />
# The set-membership approach assumes that the measured data vector belongs to a set which depends on the parameter vector. <br />
<br />
The purpose of estimation is to find an estimator that can be interpreted, which is accurate and which exhibits some form of optimality. Indicators like minimum variance unbiased estimator is usually applied to measure estimator optimality, although it is possible that an optimal estimator don’t always exist. Here we present the fundamentals of estimation theory and illustrate how to apply estimation in real studies.<br />
<br />
===Motivation===<br />
To obtain a desired estimator, or estimation, we need to first determine a probability distribution with parameters of interest based on the data. After deciding the probabilistic model, we need to find the theoretically achievable precision available to any estimator based on the model and then develop an estimator based on this model. There are variety of methods and criteria to develop and choose between estimators based on their performance: maximum likelihood estimators, Bayes estimators, method of moments estimators, minimum mean square error estimators, minimum variance unbiased estimator, best linear unbiased estimator, etc. Experiment or simulations can also be run to test estimators’ performance.<br />
<br />
===Theory===<br />
An estimate of a population parameter may be expressed in two ways:<br />
*Point estimate: a single value of estimate. For example, sample mean is a point estimate of the population mean.<br />
*Interval estimate: an interval estimate is defined by two numbers, between which a population parameter is said to lie.<br />
<br />
====Confidence Intervals (CIs)====<br />
CIs describe the uncertainty of a sampling method and contains a confidence level, a statistic and a margin of error. The statistic and the margin of error define an interval estimate, which represent the precision of the method. Confidence Interval is expressed as sample statistic ± margin of error.<br />
Interpretation of a confidence interval at 95% confidence level is that we have 95% confidence that the parameter will fall within the margin of the interval.<br />
<br />
* Confidence level: the probability part of a confidence interval. It describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. <br />
<br />
* Margin of error: range of the values above and below the sample statistic in confidence interval. ''margin of error=critical value*standard deviation of the statistic''.<br />
<br />
* Critical value: The central limit theorem states that the sampling distribution of a statistic will be normal or nearly normal and the critical value can be expressed as a t score or as a z score, if ANY of the following conditions apply:<br />
**The population distribution is normal;<br />
**The sampling distribution is symmetric, unimodal, without outliers, and the sample size is 15 or less;<br />
**The sampling distribution is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 40;<br />
**The sample size is greater than 40, without outliers.<br />
<br />
To find the critical value, follow these steps.<br />
*Compute alpha $(\alpha): \alpha = 1 - \left(\frac{confidence\ level}{100}\right)$<br />
*Find the critical probability $(p^*): p^* = 1 -\frac {\alpha} {2}$<br />
*To express the critical value as a $z$ score, find the $z$ score having a cumulative probability equal to the critical probability $(p^*)$.<br />
*To express the critical value as a t score, follow these steps. Find the degree of freedom (DF): when estimating a mean score or a proportion from a single sample, DF is equal to the sample size minus one. For other applications, the degrees of freedom may be calculated differently. We will describe those computations as they come up.<br />
<br />
: The critical t score $(t^*)$ is the t score having degrees of freedom equal to DF and a cumulative probability equal to the critical probability $(p^*)$.<br />
<br />
: Should you express the critical value as a t score or as a z score? As a practical matter, when the sample size is large (greater than 40), it doesn't make much difference. Both approaches yield similar results. Strictly speaking, when the population standard deviation is unknown or when the sample size is small, the t score is preferred. Nevertheless, many introductory statistics texts use the z score exclusively. <br />
<br />
* Standard error: an estimate of the standard deviation of a statistic. When the values of population parameters are unknown, it is valuable to compute the standard error as an unbiased estimate of the standard deviation of a statistic. It is computed form known sample statistic. The table below shows how to compute the standard error for simple random samples assuming that the population size is at least 10 times larger than the sample size.<br />
<center><br />
{| class="wikitable" style="text-align:center;width:25%"border="1"<br />
|-<br />
|Statistic || Standard error<br />
|-<br />
|Sample mean, $\bar{x}$ || $SE_{\bar{x}}=\frac{s}{\sqrt{n}}$<br />
|-<br />
|Sample proportion, $p$ || $SE_{p}=\sqrt{\frac{p(1-p)}{n}}$<br />
|-<br />
|Difference between means,$\bar{x}_{1} -\bar{x}_{2}$ || $ SE_{\bar{x}_1 -\bar{x}_2} = \sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}$<br />
|-<br />
|Difference between proportions, $\bar{p}_{1} - \bar{p}_{2}$ || $SE_{\bar{p}_{1} - \bar{p}_{2}} = \sqrt{ \frac{p_1 (1-p_1)}{n_1} +\frac{p_{2}(1-p_{2})}{n_{2}}}$<br />
|}<br />
</center><br />
<br />
* Degrees of freedom: the number of independent pieces of information on which the estimate is based<br />
<br />
: In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated to the estimate in question. Suppose we have sampled 20 data points then our estimate of the variance has 20 – 1 = 19 degree of freedom.<br />
<br />
====Characteristics of Estimators====<br />
* Bias: refers to whether an estimator tends to either overestimate or underestimate the parameter. We say an estimator is biased if the mean of the sampling distribution of the statistic is not equal to the parameter. For example, $σ^{2}=\frac{(x-μ)^{2}} {N}$ is a biased estimator of the population variance and sample variance $s^{2}=\frac{(x-\bar{x})^{2}} {N-1}$ is unbiased estimate of the population variance.<br />
<br />
*Sampling variability: refers to how much the estimate varies from sample to sample. It is usually measured by its standard error: the smaller the standard error, the less the sampling variability. For example, the standard error of the mean is $σ_M=\frac{σ}{\sqrt{N}}$. So the larger the sample size $(N)$, the smaller the standard error of the mean, hence the smaller the sample variability.<br />
<br />
*Unbiased estimate: $\eta (X_{1},X_{2},…,X_{n})=E[\delta(X_{1},X_{2},…,X_{n})|T]$ then $\delta(X_{1},X_{2},…,X_{n} )$ is unbiased estimate for $g(\theta)$ and $T$ is a complete sufficient statistic for the family of densities. <br />
<br />
*(Uniformly) Minimum-variance unbiased estimator ([http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE], or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. It may not exist.Consider estimation of $g(\theta)$ based on data $X_{1},X_{2},…,X_{n}$ independent and identically distributed from some member of a family with density $p_\theta, \theta \in \Omega $, an unbiased estimator $\delta(X_{1},X_{2},…,X_{n})$ of $g(\theta)$ is UMVUE if $∀ \theta \in \Omega$, $var(\delta(X_{1},X_{2},…,X_{n})) \leq var(\tilde{\delta} (X_{1},X_{2},…,X_{n}))$ for any other unbiased estimator $\tilde{\delta}$. <br />
<br />
: $MSE(\delta)=var(\delta)+(bias(\delta))^{2}$. The MVUE minimizes MSE among unbiased estimators. In some cases biased estimators have lower MSE because they have a smaller variance than does any unbiased estimator.<br />
<br />
===Applications===<br />
* [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE|This article]] presents the MOM and MLE methods of estimation. It illustrates the MOM method in detailed examples and attached several exercise for students to practice. MOM, which is short for Method Of Moments, is one of the most commonly used methods to estimate population parameters using observed data from the specific process. The idea is to use the sample data to calculate sample moments and then set these equal to their corresponding population counterparts. Steps: (1) determine the $k$ parameters of interest and specific distribution for this process; (2) compute the first $k$ (or more) sample-moments; (3) set the sample-moments equal to the population moments and solve for a system of $k$ equations with $k$ unknowns. Let’s look at a simple example as application of the MOM method.<br />
<br />
: Consider we want to estimate the true probability of a head by flipping the coins (assume a unfair coin). Suppose we flip the coin 10 times and observe the following outcome: {H,T,H,H,T,T,T,H,T,T}. With MOM: (1) the parameter of interest is $p=P(H)$ and it follows a Bernoulli distribution, (2) $np=E[Y]=4,p=2/5$, where $Y$ is the number of heads for one experiment and it follows a Binomial distribution. (3) estimate of true probability of flipping a head in one experiment equals $2/5$. This is an easy example of MOM proportion example.<br />
<br />
* [http://onlinestatbook.com/2/estimation/estimation.html This article] presents a fundamental introduction to estimation theory and illustrated on basic concepts and application of estimation. It offers specific examples and exercises on each concept and application and works as a good start of introduction to estimation theory. <br />
<br />
* [http://digital-library.theiet.org/content/journals/10.1049/ip-f-2.1993.0015 This article] proposed an algorithm, the bootstrap filter, for implementing recursive Bayesian filters. The required density of the state vector is represented as a set of random samples, which are updated and propagated by the algorithm. The method presented is not restricted by assumptions of linearity or Gaussian noise and it may be applied to any state transition or measurement model. It presents a simulation example of the bearings only tracking problems and includes schemes for improving the efficiency of the basic algorithm.<br />
<br />
===Software===<br />
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distribution]<br />
*[http://socr.ucla.edu/htmls/SOCR_Experiments.html SOCR Simulations & Experiments] <br />
*[http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]<br />
<br />
===Problems===<br />
* Which of the following statements is true.<br />
: a. When the margin of error is small, the confidence level is high.<br />
: b. When the margin of error is small, the confidence level is low.<br />
: c. A confidence interval is a type of point estimate.<br />
: d. A population mean is an example of a point estimate.<br />
: e. None of the above.<br />
<br />
* Which of the following statements is true.<br />
: a. The standard error is computed solely from sample attributes.<br />
: b. The standard deviation is computed solely from sample attributes.<br />
: c. The standard error is a measure of central tendency.<br />
: d. All of the above.<br />
: e. None of the above.<br />
<br />
* 900 students were randomly selected for a national survey. Among survey participants, the mean grade-point average (GPA) was 2.7, and the standard deviation was 0.4. What is the margin of error, assuming a 95% confidence level?<br />
: a. 0.013<br />
: b. 0.025<br />
: c. 0.500<br />
: d. 1.960<br />
<br />
* Suppose we want to estimate the average weight of an adult male in Dekalb County, Georgia. We draw a random sample of 1,000 men from a population of 1,000,000 men and weigh them. We find that the average man in our sample weighs 180 pounds, and the standard deviation of the sample is 30 pounds. What is the 95% confidence interval?<br />
: a. $180 \pm 1.86$<br />
: b. $180 \pm 3.0$ <br />
: c. $180 \pm 5.88$<br />
: d. $180 \pm 30$<br />
<br />
* Suppose that simple random samples of seniors are selected from two colleges: 15 students from school A and 20 students from school B. On a standardized test, the sample from school A has an average score of 1000 with a standard deviation of 100. The sample from school B has an average score of 950 with a standard deviation of 90. What is the 90% confidence interval for the difference in test scores at the two schools, assuming that test scores came from normal distributions in both schools? (Hint: Since the sample sizes are small, use a t score as the critical value.)<br />
: a. 50 + 1.70<br />
: b. 50 + 28.49<br />
: c. 50 + 32.74<br />
: d. 50 + 55.66<br />
<br />
* You know the population mean for a certain test score. You select 10 people from the population to estimate the standard deviation. How many degrees of freedom does your estimation of the standard deviation have?<br />
: a. 8<br />
: b. 9<br />
: c. 10<br />
: d. 11<br />
<br />
* In the population, a parameter has a value of 10. Based on the means and standard errors of their sampling distributions, which of these statistics estimates this parameter with the least sampling variability?<br />
: a. Mean = 10, SE = 5<br />
: b. Mean = 9, SE = 4<br />
: c. Mean = 11, SE = 2<br />
: d. Mean = 13, SE = 3<br />
<br />
===References===<br />
*[http://wiki.stat.ucla.edu/socr/index.php/Probability_and_statistics_EBook#Method_of_Moments_and_Maximum_Likelihood_Estimation SOCR]<br />
* [http://en.wikipedia.org/wiki/Estimation Estimation Wikipedia]<br />
* [http://onlinestatbook.com/2/estimation/characteristics.html OnlineStatBook: Estimation]<br />
* [http://en.wikipedia.org/wiki/Confidence_interval Confidence Interval Wikipedia]<br />
* [http://en.wikipedia.org/wiki/Minimum-variance_unbiased_estimator UMVUE Wikipedia]<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_Estimation}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14876SMHS IntroEpi2015-04-07T13:25:10Z<p>Glenbrau: /* Problems */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. Cases call health departments directly<br />
:b. Clinicians<br />
:c. Laboratories<br />
:d. All of the above<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. Host<br />
:b. Agent<br />
:c. Vector<br />
:d. Environment<br />
:e. All of the above<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same.<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly.<br />
:c. The prevalence decreases if the duration of disease is increasing.<br />
:d. The prevalence decreases if the duration of disease stays the same.<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002 - 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:a. True<br />
:b. False<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:a. True<br />
:b. False<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they had the flu), so these were not enrolled in the research study. The students who were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of this follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question; what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. Interviewer bias<br />
:b. Recall bias<br />
:c. Loss to follow-up<br />
:d. Non-differential misclassification<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns (especially if they were concerned that sun exposure was a reason they got melanoma). This is an example of:<br />
:a. Interviewer bias<br />
:b. Loss to follow-up<br />
:c. Differential misclassfication<br />
:d. Non-differential misclassification<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14875SMHS IntroEpi2015-04-07T13:01:40Z<p>Glenbrau: /* How do multiple testing improve screening programs? */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14872SMHS IntroEpi2015-04-02T16:52:51Z<p>Glenbrau: /* Measures of Association and Effect in RCT */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
<br />
Ratio of two measures of disease incidence (relative measures):<br />
<br />
*Risk Ratio (Relative Risk)<br />
*Rate Ratio<br />
<br />
Difference between two measures of disease incidence: <br />
<br />
*Risk difference<br />
*Efficacy<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
'''Interpretation''': <br />
<br />
*$RR>1$, The risk of $X$ is $RR$ times more likely to occur in group A than in group B<br />
*$RR=1$, Null value (no difference between groups)<br />
*$RR<1$, Either calculate the reduction in risk ratios (100%-$X$%) or invert ($1/RR$) to be interpreted as “less likely” risk<br />
<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
<br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control.<br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks.<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14871SMHS IntroEpi2015-04-02T16:46:44Z<p>Glenbrau: /* External and internal validity */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
<br />
*''External validity'': Generalization of study to larger source population. Influenced by factors like: <br />
:*Demographic differences between eligible and ineligible subgroups <br />
:*Intervention mirror what will happen in the community or source population<br />
<br />
*''Internal validity'': Ability to reach correct conclusion in study. Influenced by factors like: <br />
:*Ability of subjects to provide valid and reliable data<br />
:*Expected compliance with a regimen<br />
:*Low probability of dropping out<br />
<br />
====Measures of Association and Effect in RCT====<br />
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.<br />
Difference between two measures of disease incidence: Risk difference, efficacy.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control; <br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14870SMHS IntroEpi2015-04-02T16:44:57Z<p>Glenbrau: /* Steps of a RCT */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: <br />
<br />
#Hypothesis formulation<br />
#Study participant recruitment based on specific criteria<br />
#Gathering informed consent<br />
#Allocation of eligible and willing participants into random assignment study groups<br />
#Monitoring study groups for outcome under study<br />
#Comparing rates of different outcomes in various groups<br />
<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
*External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.<br />
*Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.<br />
<br />
====Measures of Association and Effect in RCT====<br />
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.<br />
Difference between two measures of disease incidence: Risk difference, efficacy.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control; <br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14869SMHS IntroEpi2015-04-02T16:42:15Z<p>Glenbrau: /* Randomized Controlled Trials (RCT) */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
In these procedures, the investigator assigns exposure at random to study participants. The investigator then observes if there are any differences in health outcomes between people who were exposed to the facto (i.e., the ''treatment group'') and those who were not (i.e., the ''comparison group''). Special care is taken in ensuring that the follow-up is done in an identical way with both groups. The essence of a good comparison between “treatments” is that the compared groups are as much the same as possible, except for their “treatment."<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: hypothesis formulation; study participant recruitment based on specific criteria; gathering informed consent; allocation of eligible and willing participants into random assignment study groups; monitoring study groups for outcome under study; comparing rates of different outcomes in various groups.<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
*External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.<br />
*Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.<br />
<br />
====Measures of Association and Effect in RCT====<br />
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.<br />
Difference between two measures of disease incidence: Risk difference, efficacy.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control; <br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14868SMHS IntroEpi2015-04-02T16:36:09Z<p>Glenbrau: /* How do multiple testing improve screening programs? */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
<br />
# ''Sequential tests'' (2-stage) are less expensive, less invasive, and less uncomfortable tests. If their results are positive, they must be followed-up with additional testing.<br />
<br />
# ''Simultaneous tests'' (parallel) involve multiple screening tests at the same time. To be considered positive, a person can test positive on either test; to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve ''net sensitivity'' (simultaneous testing) or ''net specificity'' (sequential testing). In other words:<br />
<br />
*Sequential testing decreases net sensitivity and increases net specificity<br />
*Simultaneous testing increases net sensitivity and decreases net specificity<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
The investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatment” is that the compared groups are the same except for the “treatment”.<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: hypothesis formulation; study participant recruitment based on specific criteria; gathering informed consent; allocation of eligible and willing participants into random assignment study groups; monitoring study groups for outcome under study; comparing rates of different outcomes in various groups.<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
*External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.<br />
*Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.<br />
<br />
====Measures of Association and Effect in RCT====<br />
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.<br />
Difference between two measures of disease incidence: Risk difference, efficacy.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control; <br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14867SMHS IntroEpi2015-04-02T16:31:42Z<p>Glenbrau: /* Reliability(repeatability) of tests */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack Rate Ratio'' (ARR): $ARR=\frac{Attack\,rate\,in\,those\,exposed} {Attack\,rate\,in\,those\,unexposed}$<br />
<br />
*$H_{0}:ARR=1$, and 95% confidence intervals, can be used to see whether estimated ARR interval includes the null value of 1. If ARR is much greater than 1, then people exposed are more likely to develop the illness compared to those who are unexposed.<br />
<br />
====Measuring Disease====<br />
To name and calculate two measures of incidence and to describe differences in interpreting these measures, as well as to understand the difference of the difference between proportion and a true rate.<br />
<br />
*''Incidence'': number of new cases of a disease occurring in the population during a special period of time divided by the number of persons at risk of developing the disease during that period of time. For example: if there are 2000 persons at risk during the year and 20 develop disease over that period. The incidence rate would be 20⁄2000=1%.<br />
<br />
*''Cumulative incidence'': $ \frac{Number\,of\,new\,cases}{Total\,population\,at\,risk} $<br />
<br />
*''Incidence rate'': $\frac{Number\,of\,new\,cases}{Total\,person-time\,contributed\,by\,the\,persons\,followed}$ <br />
<br />
Person time is a way to measure the amount of time all individuals in a study spend at risk. For example, if subject A is followed for 3 days, subject B is followed for 5 days and C for 8 days then person-days $= 3 + 5 + 8 = 16$.<br />
<br />
*''Prevalence'': $\frac{Number\,of\,cases\,of\,a\,disease\,in\,the\,population\,at\,a\,specified\,time}{Number\,of\,persons\,in\,the\,population\,at\,that\,time}$ <br />
<br />
*The specified time can be a period or a point, so we can measure the prevalence during a short period in January of 2013 or on January 3$^{rd}$, 2013.<br />
<br />
====Measuring Mortality Rates====<br />
To calculate and interpret all-cause mortality rates, group-specific mortality rates and cause-specific mortality rates:<br />
<br />
*All cause mortality rates = $\frac{Number\,of\,deaths\,in\,a\,specified\,time\,period}{Number\,in\,population\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Cause-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,from\,lung\,cancer\,in\,US}{Population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
*Group-specific mortality rate = $\frac{Total\,number\,of\,deaths\,in\,1\,year\,among\,women\,in\,US} {Female\,population\,of\,the\,US\,in\,the\,middle\,of\,the\,year}$<br />
<br />
====Additional Measures of Mortality====<br />
*''Infant mortality'': $\frac{Number\,of\,deaths\,in\,children\,under\,1\,year\,of\,age\,in\,2011} {(Number\,of\,live\,births\,in\,2011}$<br />
<br />
*''Proportionate mortality'': Measures proportion of all deaths occurring in a given place over a given time that is due to a given cause <br />
<br />
*''Case fatality'': Of all people diagnosed with a given disease, the proportion of persons die of a case over a certain period<br />
<br />
*''Underlying cause of death''<br />
<br />
====Direct and Indirect Adjustment of Rates====<br />
Direct and indirect adjustment of rates are used to compare two populations or one population at different time periods with different age distributions by adjusting for age to compare the mortality rates in two populations if they both have the same age distribution.<br />
<br />
*''Direct age-adjustment'': Expected rate (or standardized rate) can be compared to the crude rate or to any other similarly standardized rate.<br />
<br />
For each population:<br />
<br />
# Calculate age-specific rates<br />
# Multiply age-specific rates by the # of people in corresponding age range in standard population<br />
# Sum expected # of deaths across age groups<br />
# Divide total # of expected deaths by total standard population<br />
<br />
====Age-adjusted mortality rate for each population of interest====<br />
*Indirect age-adjustment: expected number of deaths can be compared to the number of actual deaths with the '''standardized mortality rate (SMR)'''. It is especially useful when I don’t trust the group-specific rates (i.e. if the population is too small).<br />
# Acquire age-specific mortality rates for standard population<br />
# Multiply standard population’s age-specific rates by # of people in age range in study population<br />
# Sum expected # of deaths across age groups in study population<br />
# Divide observed # of deaths by expected # of deaths in study population<br />
<br />
Result: SMR (>1 more than expected, =1 as expected, <1 less than expected)<br />
<br />
====Screening====<br />
''Screening'' is the use of testing to sort out apparently well persons (''asymptomatic'') who probably have disease from those who probably do not. It allows us to detect the disease early. Examples of screening include: <br />
<br />
*Fasting blood sugar for diabetes<br />
*Bone densitometry for osteoporosis<br />
*Otoacoustic emissions testing for hearing loss in newborns<br />
<br />
Screening is done during the preclinical phase and is a secondary prevention strategy. It increases lead time, thereby allowing us to detect disease early, initiate treatment sooner, and provide better outcomes. However, it is critical that screening programs must be warranted, and there must be a critical point that can be preceded by screening. <br />
<br />
=====Clinical utility Predictive Value & Reliability: Clinical Utility of Positive Tests=====<br />
<br />
If a patient is tested positive, the likelihood that they actually have the disease is called '''Positive Predictive Value''' (PPV). If a patient tests negative, the likelihood they actually do ''not'' have the disease is called '''Negative Predictive Value''' (NPV). PPV and NPV are affected by prevalence of disease, specificity and sensitivity of the test.<br />
<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Screening Test ||Positive|| a (True positives)|| b (False positives)<br />
|-<br />
| Negative || c (False negatives)|| d (True negatives)<br />
|}<br />
$PPV=\frac{a}{a+b},NPV=\frac{d}{c+d}$<br />
</center><br />
<br />
'''PPV interpretation:''' Given a positive result on the disease, the likelihood that an individual is positive in the screening test is PPV.<br />
<br />
'''NPV interpretation:''' Given a negative result on the disease, the likelihood that an individual is negative in the screening test is NPV.<br />
<br />
* [[SMHS_NonParamInference#McNemar_Test| See the section on McNemar Test]].<br />
<br />
===== Factors Influence Predictive Values=====<br />
<br />
''Disease prevalence'': Increasing disease prevalence increases PPV (or decreases NPV). Screening programs are most productive and efficient in high-risk populations; screening for infrequent disease may waste resources; need to present PPV in context of disease prevalence.<br />
<br />
*''Test specificity'' (ability of a test to correctly identify those who have the disease $=\frac{d}{b+d}$): Higher test specificity increases PPV.<br />
*''Test sensitivity'' (ability of a test to correctly identify those who do not have the disease =$\frac{a}{a+c})$<br />
<br />
'''Note:''' The cutoff of a disease will influence test sensitivity and specificity: lowering the cutpoint will increase true positive hence increases sensitivity; decreases true negative hence decreases specificity. Similarly, raising the cutpoint will decrease true positives hence decreases sensitivity; increase true negatives hence increases specificity.<br />
<br />
=====Validity=====<br />
<br />
''Validity'': The ability of a test to distinguish between who has disease and who does not<br />
<br />
''Reliability'': The ability to replicate results on same sample if test if repeated<br />
<br />
The following charts shows the three possible outcomes (from left to right): ''valid not reliable'', ''reliable not valid'', and ''valid and reliable''.<br />
<br />
<center><br />
[[Image:SMHS_InNtroEpi_Fig_1_2_3_C.png]]<br />
</center><br />
<br />
=====Reliability(repeatability) of tests=====<br />
<br />
Can the results be replicated if the test is redone? The results may be influenced by three factors:<br />
<br />
*''Intrasubject variation'': Variation within individual subjects<br />
*''Intraobserver variation'': Variation in reading of results by the same reader<br />
*''Interobserver variation'': Variation between those reading results<br />
<br />
=====How do multiple testing improve screening programs?===== <br />
Using multiple tests: <br />
# sequential tests(2-stage) is less expensive, less invasive, less uncomfortable test first; if positive on first test, then follow-up with additional testing.<br />
# simultaneous tests (parallel) conduct multiple screening tests at the same time; to be considered positive, the person can test positive on either test, to be considered negative, the person must test negative on all tests. <br />
<br />
Each test has own sensitivity and specificity. Utilization of multiple testing can improve net sensitivity (simultaneous testing) or net specificity (sequential testing), that is sequential testing decreases net sensitivity and increases net specificity while simultaneous testing increases net sensitivity and decreases net specificity.<br />
<br />
===Randomized Controlled Trials (RCT)===<br />
The investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatment” is that the compared groups are the same except for the “treatment”.<br />
<br />
====Steps of a RCT====<br />
RCTs involve the following sequential steps: hypothesis formulation; study participant recruitment based on specific criteria; gathering informed consent; allocation of eligible and willing participants into random assignment study groups; monitoring study groups for outcome under study; comparing rates of different outcomes in various groups.<br />
<center><br />
[[Image:MSHS_IntroEpi_Fig_3_actually2.png |400px]]<br />
</center><br />
<br />
====External and internal validity====<br />
*External validity: Generalization of study to larger source population. Influenced by factors like: demographic differences between eligible and ineligible subgroups; intervention mirror what will happen in the community or source population.<br />
*Internal validity: Ability to reach correct conclusion in study. Influenced by factors like: ability of subjects to provide valid and reliable data; expected compliance with a regimen; low probability of dropping out.<br />
<br />
====Measures of Association and Effect in RCT====<br />
Ratio of two measures of disease incidence (relative measures) - Risk Ratio (Relative Risk), Rate Ratio.<br />
Difference between two measures of disease incidence: Risk difference, efficacy.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2 rowspan=2| || colspan=2| Disease Status <br />
|-<br />
| Disease|| No Disease<br />
|-<br />
|rowspan=2 |Treatment||Drug A|| a || b <br />
|-<br />
| Placebo || c || d<br />
|-<br />
|}<br />
</center><br />
$Relative\,Risk=\frac{Cumulative\,Incidence\,in\,exposed} {Cumulative\,Incidence\,in\,unexposed}=ratio\,of\,risks=Risk\,Ratio=\frac{a/(a+b)} {c/(c+d)}=\frac{CI_{drugA}}{CI_{placebo}}$<br />
<br />
<center><br />
$Rate\, Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$<br />
</center><br />
<br />
Interpretation: RR>1, The risk of X is RR times more likely to occur in group A than in group B; RR=1, Null value (no difference between groups); RR<1, Either calculate the reduction in risk ratios (100%-xx%) or invert (1/RR) to be interpreted as “less likely” risk.<br />
<center> $Efficacy=\frac{C.I.\,rate\,in\, placebo-C.I.\,rate\, in\, the\, treatment}{C.I.\,rate\, in\, placebo\, group}$<br />
</center><br />
*Situations that favor the use of RCT:<br />
# Exposure of interest is a modifiable factor over which individuals are willing to relinquish control; <br />
# Legitimate uncertainty exists regarding the effect of interventions on outcome, but reasons exist to believe that the benefits of the intervention in question overweight the risks;<br />
# Effect of intervention on outcome is of sufficient importance to justify a large study.<br />
<br />
===Cohort Study===<br />
Population of exposed and unexposed individuals at risk of developing outcomes are followed over time to compare the development of disease in each group. <br />
*Steps: Establish the study population. Identify a study population that is reflective of base population of interest and has a distribution of exposure; identify group of exposed and unexposed individuals. Study on the outcomes of exposed and not exposed groups.<br />
[[Image:MSHS_IntroEpi_Fig2_C.png |500px|]]<br />
*Types: <br />
: Prospective (concurrent) and Retrospective Cohort Studies (non-concurrent) based on when is the data collected.<br />
: Retrospective has benefits: more cost effective; good for disease of long latency.<br />
: Prospective has benefits: data quality presumably higher.<br />
Both designs need to be cautious of ascertainment biases if outcomes or exposure is known.<br />
<br />
*Measures of Association in Cohort Study:<br />
: Ratio of two measures of disease incidence (relative measures): Risk Ratio (Relative Risk), Rate Ratio.<br />
: Difference between two measures of disease incidence: Risk Difference, Rate Difference.<br />
<br />
*Strengths and weakness of Cohort Design:<br />
: Strengths:<br />
# Maintain temporal sequence – can estimate incidence of disease; exposure precedes development of disease; also explore time-varying information. <br />
# Excellent for studying known adverse exposures or those that cannot practically be randomized. <br />
# Like RCT, excellent for studying rare exposures. <br />
# Multiple outcomes and sometimes multiple exposures can be studied.<br />
: Disadvantages: <br />
# Long-term follow-up required and expensive; <br />
# Not effective at capturing rare outcomes and can be challenging to study disease that take a long time to develop; <br />
# Loss to follow-up can be a problem; <br />
# Changes over time in criteria and methods can lead to problems with inferences; <br />
# People self-select exposures so exposed and unexposed may differ with respect to important characteristics.<br />
<br />
*Situations favor a Cohort Study: <br />
# When there is evidence of an association between the exposure and the disease from other studies;<br />
# When the exposure is rare but incidence of disease among the exposure is high;|<br />
# When time between exposure and development of the disease is relatively short or historical data is available;<br />
# When good follow-up can be ensured.<br />
<br />
===Case Control Study===<br />
A case control study compares cases and controls to see which group has greater exposure to the disease.<br />
*Measures of Association: Odds Ratio.<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
</center><br />
$Odds\, Ratio=\frac{odds\, of\, a\, case\, being\, exposed}{odds\, of\, a\, control\, being\, exposed}=\frac{(a/c)} {(b/d)}=\frac {ad}{bc}.$<br />
<br />
====Interpretation====<br />
Odds of being exposed is OR times higher (if OR > 1) in the cases than the controls (1/OR times lower (if OR < 1) in the cases than the controls; No association – odds are the same in cases and controls (if OR = 1)).<br />
<br />
*Strengths and weakness of Case Control Study:<br />
**Strengths: Case Control Study Design is efficient and can evaluate many risk factors for the same disease, so is good for diseases about which little is known; it is observational – we don’t ask people to change their behavior, we just collect information on events that happen “naturally”.<br />
**Weakness: Inefficient for rare exposures; can study only one outcome at a time; cannot calculate incidence of disease but can only estimate the odds of being exposed in cases vs. controls; the number of cases and controls in study is artificial and does not represent the natural distribution of disease in the population.<br />
<br />
*Avoiding Recall / Reporting Bias. Ways to avoid recall and report bias include: <br />
# adjusting timing so that the time between the event/illness and the study is as short as possible; use standardized questionnaires that obtain complete information;<br />
# using existing information if/when possible (e.g. medical record); <br />
# masking participants to study hypothesis<br />
*Conditions when an OR from a Case-Control Study can approximate a RR OR≈RR:<br />
# when the cases are representative, with respect to their exposure status, of all people with the disease in the population from which the cases were drawn; <br />
# when the controls are representative, with respect to their exposure status, of all people without the disease in the population from which the cases are drawn; <br />
# when the disease being studied does not occur frequently.<br />
<br />
===Cross-Sectional Studies===<br />
A cross sectional study is an observational study in which a subject’s exposure and disease data are measured at the same time; prevalent cases of the disease are identified; exposure prevalence in relation to disease prevalence (no incidence cases; unable to determine temporality).<br />
<br />
====Strengths and Limitations of Cross-Sectional Studies====<br />
* '''Strengths:'''<br />
# good for generating hypotheses;<br />
# easily sets up other analytic designs; <br />
# temporality is not a problem for time invariant exposures (genetic markers); <br />
# relatively low cost.<br />
<br />
*'''Weakness:'''<br />
# temporality – exposure or disease which happened first; <br />
# prevalent cases may not be the same as incident cases; <br />
# not useful for rare disease; <br />
# subject to selection bias.<br />
<br />
====Measures of Association in Cross Sectional Studies====<br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| colspan=2| || Case || Control <br />
|-<br />
|rowspan=2 |Exposed || Yes || a || b <br />
|-<br />
| No || c ||d<br />
|-<br />
|}<br />
$Prevalence Ratio=\frac{Prevalence\,of\,disease\,in\,exposed}{Prevalence\,of\, disease\,in\,unexposed}=\frac{a/(a+b)}{c/(c+d)}$<br />
</center><br />
<br />
===Ecologic Studies===<br />
<br />
An ecological study is an observational study in which group-level data is used for the exposure and/or the outcome. Subjects can be grouped by place (multiple-group study); by time (time-trend study); by place & time (mixed study). An error that could occur when an association identified based on group level (ecologic) characteristics are ascribed to individuals when such association do not exist at the individual level. <br />
<br />
====Strengths and Disadvantages of Ecologic Studies====<br />
*'''Strengths:''' <br />
# data is relatively easy and/or cheap to obtain; <br />
# good place to start; <br />
# many relevant social, occupational and environmental exposures cannot be ascribed to an individual.<br />
<br />
*'''Weakness:''' Reliance on group-level data may not correctly represent individual-level associations. <br />
<br />
*Ecologic fallacy is when an association between variables based on group characteristics is used to make inferences about individuals when that association does not exist.<br />
<br />
*Ecologic studies are useful for generation of new hypotheses because they are relatively easy and low-cost to conduct.<br />
<br />
===Other Risk Estimates===<br />
*Attributable Risk Estimates of Effect – if exposure causes increased risk of disease, then we can estimate how many cases of disease could be eliminated if we completely eliminate the exposure.<br />
*Attributable Risk (AR): $AR=CI_{Exposed} - CI _{Not\,exposed}$. This is just the risk difference. Group of interest: exposed and aims to quantify the risk of disease in the “exposed” group attributable to the exposure. <br />
*Attributable Risk Percent $(AR\%)$: $ AR\%$ = $\frac{(CI_{Exposed} - CI_{Not exposed})}{CI_{exposed}}$<br />
*Population Attributable Risk (PAR): $PAR= CI_{Total} - CI_{Not exposed}$<br />
*Population Attributable Risk Percent $(PAR\%)$: $PAR\%$ = $\frac{(CI_{Total}-CI_{Not exposed})} {CI_{total}}$.<br />
<br />
===Bias===<br />
Bias is a barrier to internal validity.<br />
*Causes of bias: Any systematic error in the design, conduct or analysis of a study that results in a distorted estimate of the relationship between an exposure and outcome; observed results different than true results. <br />
*Impact of bias: makes it appear as if there is an association when there really is none (bias away form the null); mask an association when there really is one (bias toward the null).<br />
*Reasons we get wrong answer: Selection bias: who is selected or retained in a study distorts your estimates of the truth. Example may be selection bias due to different retention in the study.<br />
<br />
*Mechanisms to reduce bias:<br />
**Ensure proper selection of study subjects (chose groups from the same source population; try lists of people that are more inclusive; use methods that result in high recruitment rates).<br />
**Minimize loss-to-follow up: keep participants happy and in touch with study team; review non-respondents to understand characteristics.<br />
*Information bias: the quality of your information distorts your estimate of the true association. Examples include surveillance bias, non-differential misclassification of hypertension, reporting bias and differential misclassification. Sources of measurement error/misclassification: normal variability or imprecision in measure, error due to subconscious or conscious decisions by the participant or investigator.<br />
* Confounding bias: differences between cases and controls or exposed and unexposed distorts your estimates of the truth. A variable is a confounder if it is a known risk factor for the outcome, it is associated with the exposure but not a result of the exposure. These three conditions are necessary for a variable to be considered as a confounder. <br />
* Chance: the luck of draw gets you a study sample that is not representative of the larger population.<br />
*Strategies to handle confounding: (1) in study design – individual matching, group matching, randomization (experimental) studies; (2) in data analysis – stratification, adjustment. Matching in a case-control study: <br />
<center><br />
{|class="wikitable" style="text align:center;width:25%"border="1"<br />
|-<br />
| || Control Exposed || Control Unexposed <br />
|-<br />
| Case Exposed || a || b <br />
|-<br />
|Case Unexposed || c ||d<br />
|-<br />
|}<br />
</center><br />
<br />
* Concordant pairs: both case and control exposed; neither case nor control exposed.<br />
*Discordant pairs: case exposed but control not exposed; control exposed but case not exposed.<br />
*Matched analysis: Odds ratio (only based on discordant pairs) $Odds\, Ratio =\frac {b} {c}.$<br />
<br />
''Interpretation'': If there is an association between exposure and outcome, it is not due to any factors that were matched on; you cannot conduct analyses for matched variables and outcome.<br />
*Randomization: Random allocation of exposure/”treatment” by investigator, ensure that the two groups (exposed & unexposed) are the same except for exposure of interest, able to control for both known and unknown confounders because distribution of these “3rd variables” should be equally distributed between the groups.<br />
*Stratification: Examine the relationship between exposure and outcome within each stratum of a potential confounding variable; holding the confounding variable constant. <br />
*Adjustment: A statistical technique that can be used to examine what the association between exposure and outcome would be IF the confounder was not associated with the exposure. <br />
<br />
Example of age-adjustment.<br />
<br />
[[Image:MSHS_IntroEpi_Fig4.png]]<br />
<br />
===Applications===<br />
* [http://www.sciencedirect.com/science/article/pii/S1631069107001072 This article] reviews, from some important examples, the classical methodological approach for discussing causality in epidemiology. Coronary hear disease (CHD) prevention has largely benefited in the past from the development of epidemiological research, however, the opposition association-causation is currently raised from observational data. The easy identification of DNA polymorphisms has prompted new CHD etiological research in the past 10 years. Causality of the associations presents some special characteristics when genes are involved: necessity of replication, Mendelian randomization, which might prove to be important in future research.<br />
<br />
* [http://www.sciencedirect.com/science/article/pii/S0020748912004166 This article], studies retrospectively the relationship between surveillance, staffing, and serious adverse events in children on general care postoperative units. The paper investigates these hypotheses: (1) the relationship between patient factors and surveillance would be moderated by staffing (i.e., registered nurse hours per patient per shift), and (2) the relationship between staffing and serious adverse events would be mediated by surveillance.<br />
<br />
===Software===<br />
*[http://www.distributome.org/V3/calc/StudentCalculator.html Student Calculator]<br />
*[http://socr.umich.edu/Applets/Normal_T_Chi2_F_Tables.html Normal T Chi-Squared F Tables]<br />
<br />
===Problems===<br />
<br />
How do we learn about existence of outbreaks?<br />
:a. cases call health departments directly<br />
:b. clinicians<br />
:c. laboratories<br />
:d. all of the above<br />
<br />
<br />
In the case of obesity, neighborhood access to healthy food stores represents which aspect of the epidemiologic triad?<br />
:a. host<br />
:b. agent<br />
:c. vector<br />
:d. environment<br />
:e. all of the above<br />
<br />
<br />
The Detroit population had 1 million people without lung cancer in 2000, and 700,000 people without lung cancer in 2010. During that time period, 17,000 people were newly diagnosed with lung cancer. What was the incidence rate for lung cancer in Detroit from 2000 to 2010 (expressed per 100,000 person-years)?<br />
:a. 0.002 lung cancer cases per 100,000 person years<br />
:b. 200 lung cancer cases per 100,000 person years<br />
:c. 270 lung cancer cases per 100,000 person years<br />
:d. 243 lung cancer cases per 100,000 person years<br />
<br />
<br />
In a fixed population, what happens to the prevalence of a disease when the incidence increases slightly, considering the different duration scenarios below?<br />
:a. The prevalence increases if the duration of disease is increasing or stays the same<br />
:b. The prevalence increases if the duration of disease is decreasing rapidly<br />
:c. The prevalence decreases if the duration of disease is increasing<br />
:d. The prevalence decreases if the duration of disease stays the same<br />
<br />
<br />
<br />
Ann Arbor’s Mortality Rates from Diabetes Mellitus among whites, 2002- 2012.<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|Age groups (years) ||Age-specific rates (per 100,000)|| Michigan standard population || Expected number of deaths<br />
|-<br />
|<20|| 20 ||2,000,000|| <br />
|-<br />
|20-39|| 10 || 3,000,000 ||<br />
|- <br />
|40-59 ||5 ||1,000,000||<br />
|- <br />
|>60|| 30|| 4,000,000||<br />
|- <br />
|Total || || 10,000,000 ||<br />
|}<br />
</center><br />
<br />
What is the age-adjusted mortality rate from diabetes among whites according to the table above?<br />
:a. 40.2 deaths per 100,000<br />
:b. 19.5 deaths per 100,000<br />
:c. 1.9 death per 100,000<br />
:d. 20.4 deaths per 100,000<br />
<br />
<br />
Given the information above, what is the Standardized Mortality Ratio (SMR) if the observed deaths in the white population are 3000?<br />
:a. 1.54<br />
:b. 5.02<br />
:c. 1.69<br />
:d. 0.65<br />
<br />
<br />
<br />
When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.<br />
:True<br />
:False<br />
<br />
<br />
<br />
Sequential testing tends to have higher net specificity than specificity of a single test.<br />
:True<br />
:False<br />
<br />
<br />
<br />
A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard. Use this table to answer the following questions:<br />
<center><br />
{| class="wikitable" style="text-align:center:width:25% border="1"<br />
|-<br />
|colspan=2 rowspan=2| || colspan=2|Gold standard <br />
|-<br />
|Condition Positive||Condition negative<br />
|-<br />
|rowspan=2| Result of New Test|| Test Positive ||80||70<br />
|- <br />
|Test Negative ||10 ||240<br />
|- <br />
|}<br />
</center><br />
<br />
<br />
What is the sensitivity of the new test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the specificity of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
What is the positive value of the test?<br />
:a. 77%<br />
:b. 89%<br />
:c. 80%<br />
:d. 53%<br />
<br />
<br />
Understanding health behaviors that may protect against infection with the flu in population-dense areas is of great interest to epidemiologists. To determine if proper hand washing may prevent flu transmission, investigators recruited 834 students from a university dormitory to participate in a research study. At baseline, 74 individuals were experiencing flu-like symptoms and tested positive for active antibodies against the flu virus (meaning they in fact, had the flu) and thus, were not enrolled in the research study. The students that were not ill with the flu at baseline were followed for 12 months with no loss to follow-up. Researchers asked students to contact the study team when they exhibited flu-like symptoms so that they could be tested for the flu virus. During the course of follow-up, 379 students were diagnosed with the flu. Of the students enrolled in this study, 60% reported improper hand-washing behaviors. Of the students that were diagnosed with the flu during follow-up, 280 of them reported improper hand-washing.<br />
<br />
:a. What type of study is this?<br />
:b. Why is this type of study adequate for this particular situation?<br />
:c. Imagine that you are the investigator picking the appropriate study design to answer this question, what might you have worried about in picking this design?<br />
:d. What is the best measure of association to test the relationship between hand washing and incident flu? Why?<br />
:e. Calculate and interpret the above measure of association using a 2X2 table.<br />
:f. If proper hand-washing behavior were to be used by the students who exhibited improper hand-washing techniques, how many cases per 1000 would be prevented? Interpret your findings.<br />
<br />
<br />
<br />
Chikungunya is a relatively rare viral disease transmitted by mosquitoes. This unpleasant disease is characterized by high fevers, nausea, vomiting, and crippling muscle and joint pain that may last for weeks to years as well as retinal damage. Chikungunya was recently detected in the Caribbean, prompting local epidemiologists to conduct a study on the Caribbean Island of Martinique to better understand local risk factors for Chikungunya. Researchers selected 100 individuals who tested positive for Chikungunya infection, as well as 200 individuals that did not have Chikungunya. Though they looked at multiple risk factors, the epidemiologists focused primarily on individuals’ use or non-use of mosquito repellent. Participants were asked about their repellent use (yes/no) in the 12 months preceding enrollment in the study. In their eventual publication, researchers reported that in total, 142 of the participants reported not using repellent. It was also noted that 31% of the participants who did not have Chikungunya reported no repellent use.<br />
:a. What type of study design was used in this example?<br />
:b. Why is this type of study appropriate for this particular situation?<br />
:c. Given that the participants were asked about their use of repellent in the past, what is a potential limitation of this study? <br />
:d. Set up a 2X2 table to assess the relationship between Chikungunya infection and improper mosquito repellent use.<br />
:e. What is the appropriate measure of association for this study? Explain why.<br />
:f. Calculate and interpret your measure of association.<br />
<br />
<br />
A group of epidemiologists at a prestigious university decided to conduct a survey of public health students to investigate the relationship between cramping of the hands and creating 2x2 tables by hand. This survey was administered just once and there was no follow-up of the participants.<br />
:a. What type of study is this?<br />
:b. What type of measure of association is appropriate for this study? Why?<br />
:c. Our epidemiologists found that 75% of study participants who had hand cramping reported excessive 2x2 table making. Are the epidemiologists justified in claiming that this study provides causal evidence that 2x2 table making leads to hand cramping? Why?<br />
<br />
<br />
Parents of children who were born with birth defects may be more likely to remember any drug or exposure that occurred during pregnancy than parents of children born without birth defects. This is an example of what type of bias?<br />
:a. interviewer bias<br />
:b. recall bias<br />
:c. loss to follow-up<br />
:d. non-differential misclassification<br />
<br />
<br />
Using data from the Nurses Health Study, the association between self-reported frequency of sunburns and melanoma was examined. When questioned after the diagnosis of melanoma, some women with melanoma may have exaggerated their frequency of sunburns especially if they were concerned that sun exposure was a reason they got melanoma. This is an example of:<br />
:a. interviewer bias<br />
:b. loss to follow-up<br />
:c. differential misclassfication<br />
:d. non-differential misclassification<br />
<br />
<br />
===References===<br />
*[http://en.wikipedia.org/wiki/Epidemiology Epidemiology Wikipedia]<br />
<br />
<br />
<hr><br />
* SOCR Home page: http://www.socr.umich.edu<br />
<br />
{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi}}</div>Glenbrauhttps://wiki.socr.umich.edu/index.php?title=SMHS_IntroEpi&diff=14866SMHS IntroEpi2015-04-02T16:31:03Z<p>Glenbrau: /* Validity */</p>
<hr />
<div>==[[SMHS| Scientific Methods for Health Sciences]] - Introduction to Epidemiology ==<br />
<br />
===Overview===<br />
[http://en.wikipedia.org/wiki/Epidemiology Epidemiology] is the study of the distribution and determinants of disease frequency in human populations. It is the only scientific discipline that is concerned with the occurrence of disease in human populations and how it changes over time. This introduction to epidemiology aims to introduce the field and to explain the basic concepts and methodologies that will be applied later in this context. It also aims to help students solve and analyze epidemiological problems and to introduce students to various epidemiological studies.<br />
<br />
===Motivation===<br />
In this introduction to epidemiology, we will: <br />
*Study the language of epidemiology and identify key sources of data for epidemiological purposes<br />
*Be able to calculate and interpret measures of disease frequency<br />
*Recognize and evaluate epidemiological study designs and their limitations<br />
*Be informed consumers of epidemiological sources of information (e.g., journals, websites, government agencies).<br />
<br />
===Theory===<br />
*Five main goals of epidemiology:<br />
# To identify the cause of disease and its risk factors<br />
# To determine the extent of disease found in the community<br />
# To study the natural history and prognosis of disease<br />
# To evaluate new preventative and therapeutic measures<br />
# To provide a foundation for developing public policy<br />
<br />
*Distinguishing between ''endemic'', ''epidemic'', and ''pandemic'':<br />
#''Endemic'': The habitual presence (or usual occurrence) of a disease within a given geographic area;<br />
#''Epidemic'': The occurrence of a disease clearly in excess of normal expectancy in a given geographic area;<br />
#''Pandemic'': A worldwide epidemic affecting an exceptionally high proportion of the global population.<br />
<br />
*Modes of Disease Transmission<br />
#''Direct contact'': Transmission occurs when the pathogen is transferred by contact from an infected person to contaminated intermediate object such as sneeze, touch or sexual intercourse <br />
#''Indirect contact'': Transmission involves the transfer of pathogen by contact with a contaminated intermediate inanimate object or vector<br />
##''Inanimate (object or vehicle)'': Examples may be toy, food or water<br />
##''Vector-borne (animal or insect)'': Examples include mosquitoes, ticks and mice<br />
<br />
*Attack Rates and Ratios (ARR)<br />
<br />
:Attack rates and ratios use statistics to develop and evaluate hypotheses in an outbreak. This process involves: <br />
<br />
#Starting with the big picture and the big risk factors for disease (e.g., “How many people at the event got ill?”)<br />
#Refining the big picture into smaller questions (e.g., “Did they eat the salad? Chicken? Or ice cream?”)<br />
#Formulating a hypothesis (e.g., “Among those who eat at the buffet, are the people who ate the Caesar salad at greater risk than those who did not?”)<br />
<br />
:''Attack Rates'' (AR): $AR=\frac{Number\,of\,people\,at\,risk\,who\,develop\,a\,certain\, illness} {Total\,number\,of\,people\,at\,risk}$ <br />
:''Attack