Difference between revisions of "SMHS AssociationTests"

Revision as of 16:18, 29 August 2014

Scientific Methods for Health Sciences - Association Tests

Overview

Measuring the association between two quantities is one of the most commonly applied tools researchers needed in studies. The term association implies on the possible correlation where two or more variables vary accordingly to some pattern. There are many statistical measures of association including relative ratio, odds ratio and absolute risk reduction. In this section, we are going to introduce measures of association in different studies.

Motivation

In many cases, we need to measure if two quantities are associated with each other -- that is if two or more variables vary together according to some pattern. There are many statistical tools we can apply to measure the association between variables. How can we decide what types of measures we need to use? How do we interpret the test results? What does the test results imply about the association between the variables we studied?

Theory

Measures of Association:

(1) relative measures: $Relative\, risk=\frac{Cumulative\, incidence\, in\, exposed}{Cumulative\,incidence\, in \,unexposed}=ratio\,of\,risks =Risk\,Ratio;$ $Rate\,Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$

(2) difference: $Efficacy=\frac{Cumulative\,incidence\,in\,placebo\,-\,Cumulative\,incidence\,in\,the\,treatement} {Cumulative\,incidence\,rate\,in\,placebo\,group}.$

We are going to interpret the measurement results and conclude about the association between variables through examples in different types of trials.

Chi-square test: a non-parametric test of statistical significance of two variables. It tests if the measured factor is associated with the members in one of two samples with chi square. For example, the chi-square test tests whether there is statistical evidence that the measured factor is not randomly distributed in the cases compared to the controls in a case-control study. The test statistic is 〖χ_o〗^2=∑_(i=1)^n▒(O_i-E_i )^2/E_i ~χ_df^2, where E_i is the expected frequency under the null hypothesis and O_i is the observed frequency, n is the number of cells in the table and df=(# rows-1)(# columns-1), E=(row total*column total)/(gran total). The null hypothesis is that there is no association between exposure group the disease studied.

Conditions for validity of the χ^2test are:
- Design conditions
  - for a goodness of fit, it must be reasonable to regard the data as a random sample of categorical observations from a large population.
  - for a contingency table, it must be appropriate to view the data in one of the following ways: as two or more independent random samples, observed with respect to a categorical variable; as one random sample, observed with respect to two categorical variables.
  - for either type of test, the observations within a sample must be independent of one another.
- Sample conditions: critical values only work if each expected value > 5

Example of association: a study on the association of a particular gene and the risk of late onset disease. The data is summarized in the data table below:

Genotype	Cases	Controls	Total
Exposure group	89	51	140
Reference group	119	134	253
Total	208	185	393

odds\,ratio=1.97,(95% CI:1.29-3.00),χ^2=(89-74.10)^2/74.10+(119-133.90)^2/133.90+(51-65.90)^2/65.90+(134-119.10)^2/119.10=9.89 ,(p=0.0017)

Example of non-association: this study examined the same gene and disease and was repeated in a separate data-set with more cases and controls and obtain different results (no evidence of association).

Genotype	Cases	Controls	Total
Exposure group	209	97	306
Reference group	400	220	620
Total	609	317	926

Odds ratio=1.19,(95% CI:0.88-1.60),χ^2=(209-201.25)^2/201.25+(400-407.75)^2/407.75+(97-104.75)^2/104.75+(220-212.25)^2/212.25=1.30,(p=0.25)

The case frequencies were the same between the two studies but control frequencies were different. We fail to reject the null hypothesis of no association and failed to find association suggesting that the first study was false positive.

Fisher’s exact test: a statistical significance test used in the analysis of contingency tables and is valid for all sample sizes especially when the sample size is small. Suppose we want to study if the proportion of dieting is higher among women than among men and the following data is collected:

	Men	Women	Row Total
Dieting	1	9	10
Non-dieting	11	3	14
Column Total	12	12	14

Fisher showed the probability of obtaining any such set of values was given by the hyper-geometric distribution

p=(■(a+b@a))(■(c+d@c))/((■(n@a+c)) )=(a+b)!(c+d)!(a+c)!(b+d)!/(a! b! c! d! n!)=(■(10@1))(■(14@11))/((■(24@12)) )=10!14!12!12!/11!9!11!3!24!≈0.00135

this is the exact hyper-geometric probability of observing this particular arrangement of the data assuming the given marginal totals on the null hypothesis that men and women are equally likely to be dieters. The smaller the p value, the greater the evidence to reject the null hypothesis, so we have significant evidence to reject the null hypothesis and conclude that women and men are not equally likely to be dieters.

Randomized Controlled Trials (RCT): Investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatments” is that the compared groups are the same except for the “treatment”.

Steps of a RCT: hypothesis formed; study participant recruited based on specific criteria and their informed consent is sought; eligible and willing participants randomly allocated to receive assignment to a particular study group; study groups are monitored for outcome under study; rates of outcome in the various groups are compared:

SOCR Home page: http://www.socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 8: / Line 8: @@
 ===Theory===
-*Measures of Association: (1) relative measures $Relative\, risk=\frac{Cumulative\, incidence\, in\, exposed}{Cumulative\,incidence\, in \,unexposed}=ratio\, of\, risks =Risk\, Ratio$;
+*Measures of Association:
+(1) relative measures:
+$Relative\, risk=\frac{Cumulative\, incidence\, in\, exposed}{Cumulative\,incidence\, in \,unexposed}=ratio\,of\,risks =Risk\,Ratio;$  $Rate\,Ratio=\frac{Incidence\,rate\,in\,exposed} {Incidence\,rate\,in\,unexposed}$
-$Rate \,Ratio=\frac{Incidence\, rate\, in\, exposed} {Incidence\, rate\, in\, unexposed}$;
-(2) difference: $Efficacy=\frac{Cumulative\, incidence\, in \,placebo \,- \,Cumulative\, incidence\, in\, the\, treatement} {Cumulative\, incidence\, rate\, in\, placebo\, group}.
+(2) difference:
+$Efficacy=\frac{Cumulative\,incidence\,in\,placebo\,-\,Cumulative\,incidence\,in\,the\,treatement} {Cumulative\,incidence\,rate\,in\,placebo\,group}.$
 We are going to interpret the measurement results and conclude about the association between variables through examples in different types of trials.
-*Chi-square test: a non-parametric test of statistical significance of two variables. It tests if the measured factor is associated with the members in one of two samples with chi square. For example, the chi-square test tests whether there is statistical evidence that the measured factor is not randomly distributed in the cases compared to the controls in a case-control study. The test statistic is 〖χ_o〗^2=∑_(i=1)^n▒(O_i-E_i )^2/E_i ~χ_df^2, where E_i is the expected frequency under the null hypothesis and O_i is the observed frequency, n is the number of cells in the table and df=(# rows-1)(# columns-1), E=(row total*column total)/(gran total). The null hypothesis is that there is no association between exposure group the disease studied.
+'''Chi-square test:''' a non-parametric test of statistical significance of two variables. It tests if the measured factor is associated with the members in one of two samples with chi square. For example, the chi-square test tests whether there is statistical evidence that the measured factor is not randomly distributed in the cases compared to the controls in a case-control study. The test statistic is 〖χ_o〗^2=∑_(i=1)^n▒(O_i-E_i )^2/E_i ~χ_df^2, where E_i is the expected frequency under the null hypothesis and O_i is the observed frequency, n is the number of cells in the table and df=(# rows-1)(# columns-1), E=(row total*column total)/(gran total). The null hypothesis is that there is no association between exposure group the disease studied.
-**Conditions for validity of the χ^2test are:
+*Conditions for validity of the χ^2test are:
-***Design conditions
+**Design conditions
-****for a goodness of fit, it must be reasonable to regard the data as a random sample of categorical observations from a large population.
+***for a goodness of fit, it must be reasonable to regard the data as a random sample of categorical observations from a large population.
-****for a contingency table, it must be appropriate to view the data in one of the following ways: as two or more independent random samples, observed with respect to a categorical variable; as one random sample, observed with respect to two categorical variables.
+***for a contingency table, it must be appropriate to view the data in one of the following ways: as two or more independent random samples, observed with respect to a categorical variable; as one random sample, observed with respect to two categorical variables.
-****for either type of test, the observations within a sample must be independent of one another.
+***for either type of test, the observations within a sample must be independent of one another.
-***Sample conditions: critical values only work if each expected value > 5
+**Sample conditions: critical values only work if each expected value > 5
-*Example of association: a study on the association of a particular gene and the risk of late onset disease. The data is summarized in the data table below:
+'''Example of association:''' a study on the association of a particular gene and the risk of late onset disease. The data is summarized in the data table below:
+<center>
+{| class="wikitable" style="text-align:center; width:35%" border="1"
+|-
+|Genotype||	Cases	||Controls||	Total
+|-
+|Exposure group||	89||	51||	140
+|-
+|Reference group||	119||	134||	253
+|-
+|Total||	208	||185||	393
+|-
+|}
+</center>
+ odds\,ratio=1.97,(95% CI:1.29-3.00),χ^2=(89-74.10)^2/74.10+(119-133.90)^2/133.90+(51-65.90)^2/65.90+(134-119.10)^2/119.10=9.89 ,(p=0.0017)
+'''Example of non-association:''' this study examined the same gene and disease and was repeated in a separate data-set with more cases and controls and obtain different results (no evidence of association).
+<center>
+{| class="wikitable" style="text-align:center; width:35%" border="1"
+|-
+|Genotype||	Cases	||Controls	||Total
+|-
+|Exposure group||	209||	97||	306
+|-
+|Reference group||	400||	220||	620
+|-
+|Total||	609||	317||	926
+|-
+|}
+</center>
+ Odds ratio=1.19,(95% CI:0.88-1.60),χ^2=(209-201.25)^2/201.25+(400-407.75)^2/407.75+(97-104.75)^2/104.75+(220-212.25)^2/212.25=1.30,(p=0.25)
+The case frequencies were the same between the two studies but control frequencies were different. We fail to reject the null hypothesis of no association and failed to find association suggesting that the first study was false positive.
+'''Fisher’s exact test:''' a statistical significance test used in the analysis of contingency tables and is valid for all sample sizes especially when the sample size is small. Suppose we want to study if the proportion of dieting is higher among women than among men and the following data is collected:
+<center>
+{| class="wikitable" style="text-align:center; width:35%" border="1"
+|-
+|  || Men  ||	  Women  ||	Row Total
+|-
+|Dieting||	1||	9||	10
+|-
+| Non-dieting  ||	11||	3||	14
+|-
+|Column Total||	12||	12||	14
+|-
+|}
+</center>
+Fisher showed the probability of obtaining any such set of values was given by the hyper-geometric distribution
+ p=(■(a+b@a))(■(c+d@c))/((■(n@a+c)) )=(a+b)!(c+d)!(a+c)!(b+d)!/(a! b! c! d! n!)=(■(10@1))(■(14@11))/((■(24@12)) )=10!14!12!12!/11!9!11!3!24!≈0.00135
+this is the exact hyper-geometric probability of observing this particular arrangement of the data assuming the given marginal totals on the null hypothesis that men and women are equally likely to be dieters. The smaller the p value, the greater the evidence to reject the null hypothesis, so we have significant evidence to reject the null hypothesis and conclude that women and men are not equally likely to be dieters.
+'''Randomized Controlled Trials (RCT):''' Investigator assigns exposure at random to study participants, investigator then observes if there are differences in health outcomes between people who were (treatment group) and were not (comparison group) exposed to the facto. Special care is taken in ensuring that the follow-up is done in an identical way in both groups. The essence of good comparison between “treatments” is that the compared groups are the same except for the “treatment”.
+*Steps of a RCT: hypothesis formed; study participant recruited based on specific criteria and their informed consent is sought; eligible and willing participants randomly allocated to receive assignment to a particular study group; study groups are monitored for outcome under study; rates of outcome in the various groups are compared:

Difference between revisions of "SMHS AssociationTests"

Revision as of 16:18, 29 August 2014

Contents

Scientific Methods for Health Sciences - Association Tests

Overview

Motivation

Theory

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools