Difference between revisions of "SMHS UbiquitousVariation"
Line 37: | Line 37: | ||
A similar measure of variation covers (more or less) the middle 50 percent. It is the interquartile range:Q <sub>3</sub> - Q <sub>1</sub> where Q <sub>1</sub> and Q <sub>3</sub> are the first and third quarters. | A similar measure of variation covers (more or less) the middle 50 percent. It is the interquartile range:Q <sub>3</sub> - Q <sub>1</sub> where Q <sub>1</sub> and Q <sub>3</sub> are the first and third quarters. | ||
+ | |||
*3.2) Variance: unlike range, which only involves the largest and smallest data, variance involves all the data values. | *3.2) Variance: unlike range, which only involves the largest and smallest data, variance involves all the data values. | ||
+ | ** Population variance: where mu is the population mean of the data and N is the size of the population. | ||
− | |||
*3.3) Standard deviation: It is the square root of variance. Given that the deviations in variance were squared, meaning the units were squared, so to take the square root of the variance gets the unit back the same as the original data values. | *3.3) Standard deviation: It is the square root of variance. Given that the deviations in variance were squared, meaning the units were squared, so to take the square root of the variance gets the unit back the same as the original data values. | ||
+ | **Population variance: where mu is the population mean of the data and N is the size of the population. | ||
+ | **Unbiased estimate of the population stand deviation (sample standard deviation): where X is the sample mean and n is the sample size. | ||
− | |||
− | |||
Consider an example: a biologist found 8, 11, 7, 13, 10, 11, 7 and 9 contaminated mice in 8 groups. Calculate s. | Consider an example: a biologist found 8, 11, 7, 13, 10, 11, 7 and 9 contaminated mice in 8 groups. Calculate s. |
Revision as of 08:55, 7 July 2014
== Scientific Methods for Health Sciences - Ubiquitous Nature of Process Variability ==
IV. HS 850: Fundamentals
Ubiquitous variation
1) Overview: In real world, variation exists in almost all the data set. The truth is no matter how controlled the environment is in the protocol or the design, virtually any repeated measurement, observation, experiment, trial, or study is bounded to generate data that varies because of intrinsic (internal to the system) or extrinsic (ambient environment) effects. And the extent to which they are unalike, or vary can be noted as variation. Variation is an important concept in statistics and measuring variability is of special importance in statistic inference. And measure of variation, which is namely measures that provided information on the variation, illustrates the extent to which data are dispersed or spread out. We will introduce several basic measures of variation commonly used in statistics: range, variation, standard deviation, sum of squares, Chebyshev’s theorem and empirical rules.
2) Motivation: Variation is of significant importance in statistics and it is ubiquitous in data. Consider the example in UCLA’s study of Alzheimer’s disease which analyzed the data of 31 Mild Cognitive Impairment (MCI) and 34 probable Alzheimer’s disease (AD) patients. The investigators made every attempt to control as many variables as possible. Yet, demographic information they collected from the outcomes of the subjects contained unavoidable variation. The same study found variation in the MMSE cognitive scores even in the same subject. The table below shows the demographic characteristics for the subjects and patients included in this study, where the following notation is used M (male), F (female), W (white), AA (African American), A (Asian).
Variable | Alzhelmer's disease | MCI | Test Statistics | Test Score | P-value |
Age(years) | 76.2 (8.3) range 52-89 | 73.7 (7.3) range 57-84 | Student’s T | t 0 =1.284 | p=0.21 |
Gender(M:F) | 15:19 | 15:16 | Proportion | z 0= -0.345 | p=0.733 |
Education(years) | 14.0 (2.1) range 12-19 | 16.23 (2.7) range 12-20 | Wilcoxon rank sum | w 0 =773.0 | p<0.001 |
Race(W:AA:A) | 29:1:4 | 26:2:3 | x 2 (df=2) | x 2 (df=2) =1.18 | 0.55 |
MMSE | 20.9 (6.3) range 4-29 | 28.2 (1.6) range 23-30 | Wilcoxon rank sum | w 0 =977.5 | p<0.001 |
Once we accept that all natural phenomena are inherently variant and there aren’t completely deterministic processes, we need to look for measures of variation that allow us to know the extent to which the data are dispersed. Suppose, for instance, we flip a coin 50 times and get 15 heads and 35 tails. But according to the fundamental probability theory where we assume it’s a fair coin, we should have got 25 heads and 25 tails. So, what happened here? Now, suppose there are 100 students and each one flipped the coin 50 times. So, how would you imagine the results to be?
3) Theory
Measures of variation:
- 3.1) Range: range is the simplest measure of variation and it is the difference between the largest value and the smallest. Range = Maximum – Minimum.
Suppose the pulse rate of Jack varied from 70 to 76 while that of Tom varied from 58 to 79. Here we have Jack has a range of 76 – 70 = 6 and Tom has a range of 79 – 58 = 21. Hence we conclude that Tom has a big variation in pulse rate compared to Jack with the range measure.
A similar measure of variation covers (more or less) the middle 50 percent. It is the interquartile range:Q 3 - Q 1 where Q 1 and Q 3 are the first and third quarters.
- 3.2) Variance: unlike range, which only involves the largest and smallest data, variance involves all the data values.
- Population variance: where mu is the population mean of the data and N is the size of the population.
- 3.3) Standard deviation: It is the square root of variance. Given that the deviations in variance were squared, meaning the units were squared, so to take the square root of the variance gets the unit back the same as the original data values.
- Population variance: where mu is the population mean of the data and N is the size of the population.
- Unbiased estimate of the population stand deviation (sample standard deviation): where X is the sample mean and n is the sample size.
Consider an example: a biologist found 8, 11, 7, 13, 10, 11, 7 and 9 contaminated mice in 8 groups. Calculate s.
- SOCR Home page: http://www.socr.umich.edu
Translate this page: