Difference between revisions of "SMHS NonParamInference"
(→Fligner-Killeen test: Variance Homogeneity (Differences of Variances of Independent Samples)) |
m |
||
| (33 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
==[[SMHS| Scientific Methods for Health Sciences]] - Non-Parametric Inference == | ==[[SMHS| Scientific Methods for Health Sciences]] - Non-Parametric Inference == | ||
| − | |||
===Overview=== | ===Overview=== | ||
| − | Nonparametric inference | + | Nonparametric inference involves descriptive and inferential statistics that are not based on parametrized families of probability distributions (like the Normal or Poisson distributions). Unlike parametric inference, which assumes the data follows a specific distribution structure defined a priori, nonparametric inference is "distribution-free". |
| − | + | The term "nonparametric" does not mean the models lack parameters entirely. Rather, it implies that the number and nature of the parameters are flexible and determined from the data rather than fixed in advance. In this lecture, we introduce the area of nonparametric inference and illustrate various applications with examples. | |
===Motivation=== | ===Motivation=== | ||
| − | We have discussed | + | We have previously discussed parametric inference, where conclusions are drawn based on assumptions about the population's probability distribution. |
| − | + | * Question: What if these assumptions (e.g., Normality) are violated? What if the variables cannot be categorized into a known parametrized family? | |
| − | + | * Answer: Distribution-free (nonparametric) statistical methods are the solution. | |
| + | ====Motivational Clinical Example==== | ||
| + | Consider 16 studies of septic patients reporting the [[SMHS_OR_RR|relative risk]] of mortality associated with acute renal failure. A relative risk (RR) of 1.0 implies no effect, while <math>RR \ne 1</math> suggests a beneficial or detrimental effect. | ||
| + | The goal is to determine if developing acute renal failure impacts mortality based on this cumulative evidence. | ||
| + | [Image of boxplot showing skewed distribution] | ||
| + | The data is heavily skewed and not bell-shaped, making the traditional [[SMHS_HypothesisTesting#Comparing_the_means_of_two_samples|paired t-test]] inappropriate. | ||
<center> | <center> | ||
{|class="wikitable" style="text-align:center; width:75%" border="1" | {|class="wikitable" style="text-align:center; width:75%" border="1" | ||
|- | |- | ||
| − | |Study||Relative Risk||Sign (Relative Risk -1) | + | |Study||Relative Risk||Sign (Relative Risk - 1) |
|- | |- | ||
|1||0.75||- | |1||0.75||- | ||
| Line 46: | Line 50: | ||
|} | |} | ||
</center> | </center> | ||
| − | |||
===Theory=== | ===Theory=== | ||
| + | ====[[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Sign-Test|The Sign Test]]==== | ||
| + | The [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Sign-Test|Sign Test]] is the simplest nonparametric alternative to the One-Sample and Paired T-Test. It does not require the data to be normally distributed. It relies solely on the direction (sign) of the difference between an observation and a hypothesized value (usually the median). | ||
| + | '''Concept:''' If there is no effect (Null Hypothesis <math>H_0</math>), positive and negative deviations from the median should be equally likely, similar to a coin toss (<math>P=0.5</math>). | ||
| + | * Application: In our sepsis example, if renal failure had no effect, we would expect about half the relative risks to be <math>>1</math> (<math>+</math>) and half to be <math><1</math> (<math>-</math>). | ||
| + | * Observation: We observe 3 studies with "<math>-</math>" and 13 studies with "<math>+</math>". | ||
| + | * Intuition Check: Is a 13 vs 3 split likely to happen by fair coin flips alone? Intuitively, this difference appears too large to be random variation. | ||
| + | '''Formal Calculations:''' | ||
| + | Let <math>N_{+}</math> be the number of "+" signs. We test at significance level <math>\alpha=0.05</math>. | ||
| + | * <math>H_{0}: Median = 1</math> (Implies <math>P(+) = 0.5</math>) | ||
| + | * <math>H_{a}: Median \ne 1</math> (Implies <math>P(+) \ne 0.5</math>) | ||
| + | Test Statistic: <math>B_{S} = \max(N_{+}, N_{-})</math>. Here, <math>B_S = \max(13, 3) = 13</math>. | ||
| + | We calculate the probability of observing 13 or more successes in 16 trials under the null hypothesis (<math>p=0.5</math>): | ||
| + | <math>P(B_{S} \ge 13 | B_{S} \sim Bin(16,0.5)) = 0.0106</math> | ||
| − | + | : Verify this using the [https://distributome.org/V3/calc/BinomialCalculator.html Distributome Binomial Calculator]. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | '''Critical Note on P-Values:''' The value <math>0.0106</math> is the one-sided probability. Since our hypothesis is two-sided (<math>\ne</math>), the p-value is <math>2 \times 0.0106 = 0.0212</math>. Since <math>0.0212 < 0.05</math>, we reject <math>H_0</math> and conclude that acute renal failure has a significant effect on mortality. | |
| + | '''Example 2: Twin Aggressiveness''' | ||
| + | 12 pairs of identical twins are tested to see if the firstborn is more aggressive. Data is paired naturally by twinship. | ||
| + | Note: One pair has equal scores (Tie). In the Sign Test, ties are typically discarded, reducing <math>n</math>. | ||
<center> | <center> | ||
{|class="wikitable" style="text-align:center; width:75%" border="1" | {|class="wikitable" style="text-align:center; width:75%" border="1" | ||
| Line 61: | Line 76: | ||
|Twin-Index||1st Born||2nd Born||Sign | |Twin-Index||1st Born||2nd Born||Sign | ||
|- | |- | ||
| − | | | + | |...||...||...||... |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
|- | |- | ||
|6||72||72||0 (Drop) | |6||72||72||0 (Drop) | ||
|- | |- | ||
| − | | | + | |...||...||...||... |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
|} | |} | ||
</center> | </center> | ||
| − | + | *(See full table in previous section)* | |
| − | + | Using SOCR Sign Test Analysis: | |
| − | + | * P-value = 0.274. | |
| − | + | * Conclusion: We cannot reject the null hypothesis at the 5% level. There is no strong evidence of a birth-order effect. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
====[[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Wilcoxon_Signed_Rank_Test|The Wilcoxon Signed Rank Test]]==== | ====[[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Wilcoxon_Signed_Rank_Test|The Wilcoxon Signed Rank Test]]==== | ||
| − | The [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Wilcoxon_Signed_Rank_Test|Wilcoxon Signed Rank Test]] | + | The Sign Test ignores the magnitude of differences, utilizing only their direction. The [[AP_Statistics_Curriculum_2007_NonParam_2MedianPair#The_Wilcoxon_Signed_Rank_Test|Wilcoxon Signed Rank Test]] improves power by incorporating the ranks of the absolute differences. |
| − | + | '''Assumptions:''' | |
| − | + | * Data are paired and drawn independently. | |
| + | * The dependent variable is continuous (interval scale). | ||
| + | * The distribution of differences is symmetric (though not necessarily Normal). | ||
| + | '''Motivational Example:''' Central venous oxygen saturation (<math>SvO_{2}</math>) in 10 patients at admission vs. 6 hours later. | ||
| + | * <math>H_{0}</math>: The median difference is zero. | ||
| + | '''Procedure:''' | ||
| + | # Calculate difference <math>D_i = X_i - Y_i</math>. | ||
| + | # Discard pairs where <math>D_i = 0</math>. | ||
| + | # Rank the absolute differences <math>|D_i|</math>. | ||
| + | # Assign the original sign of <math>D_i</math> to the rank. | ||
<center> | <center> | ||
{|class="wikitable" style="text-align:center; width:75%" border="1" | {|class="wikitable" style="text-align:center; width:75%" border="1" | ||
|- | |- | ||
| − | + | ! Patient||Diff (X-Y)||Abs Diff||Rank||Signed Rank | |
|- | |- | ||
| − | | | + | |10||5.5||5.5||4||4 |
|- | |- | ||
| − | | | + | |2||2.4||2.4||1||1 |
|- | |- | ||
| − | | | + | |7||-2.5||2.5||2||-2 |
|- | |- | ||
| − | | | + | |9||-3.5||3.5||3||-3 |
|- | |- | ||
| − | |3|| | + | |3||-5.8||5.8||5||-5 |
|- | |- | ||
| − | |5|| | + | |5||-7.1||7.1||6||-6 |
|- | |- | ||
| − | |6|| | + | |6||-12.2||12.2||7||-7 |
|- | |- | ||
| − | |1|| | + | |1||-13.2||13.2||8||-8 |
|- | |- | ||
| − | |4|| | + | |4||-13.7||13.7||9||-9 |
|- | |- | ||
| − | |8|| | + | |8||-17.7||17.7||10||-10 |
| − | | | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |- | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
|- | |- | ||
| − | | | + | |'''Sum'''||||||||'''W = -45''' |
|} | |} | ||
</center> | </center> | ||
| − | + | '''Test Statistics & Variance Correction:''' | |
| − | < | + | There are two common ways to report the statistic: |
| − | + | # <math>W_{net}</math> (Sum of signed ranks): In the table above, <math>W_{net} = -45</math>. Expected value <math>E(W_{net}) = 0</math>. Variance <math>Var(W_{net}) = \frac{n(n+1)(2n+1)}{6}</math>. | |
| − | + | # <math>W_{+}</math> (Sum of positive ranks): Often used by software (like R). <math>W_{+} = 4+1 = 5</math>. Expected value <math>E(W_+) = \frac{n(n+1)}{4}</math>. Variance <math>Var(W_+) = \frac{n(n+1)(2n+1)}{24}</math>. | |
| − | </ | + | '''Results Interpretation (from SOCR/R):''' |
| + | : Wilcoxon Statistic (<math>W_+</math>) = 5.000 | ||
| + | : Expected Value (<math>E(W_+)</math>) = 27.500 | ||
| + | : Variance (<math>Var(W_+)</math>) = 96.250 | ||
| + | : Z-Score = <math>\frac{5 - 27.5}{\sqrt{96.25}} \approx -2.293</math> | ||
| + | : Two-Sided P-Value = 0.022 | ||
| + | : Conclusion: Reject <math>H_0</math>. There is a significant difference in oxygen saturation after 6 hours. | ||
| + | '''R Calculations:''' | ||
| + | <pre> | ||
| + | # Modern R Syntax using Data Frame | ||
| + | df_ox <- data.frame( | ||
| + | Admission = c(65.3,59.1,58.2,56,56.1,60.6,37.8,39.7,57.7,33.6), | ||
| + | Later_6h = c(59.8,56.7,60.7,59.5,61.9,67.7,50,52.9,71.4,51.3) | ||
| + | ) | ||
| + | # Perform Test | ||
| + | wilcox.test(df_ox$Admission, df_ox$Later_6h, | ||
| + | paired=TRUE, alternative = "two.sided") | ||
| + | </pre> | ||
| − | + | ====[[AP_Statistics_Curriculum_2007_NonParam_2MedianIndep#The_Wilcoxon-Mann-Whitney_Test|Wilcoxon-Mann-Whitney (WMW) Test]]==== | |
| − | + | Also known as the Mann-Whitney U test, this is the nonparametric analogue to the Independent Samples T-test. It assesses whether two independent samples come from the same distribution. | |
| − | + | '''Motivational Example:''' Soil pH levels at two different locations (Location 1 vs. Location 2). | |
| − | + | * Assumption Check: Histograms show data is not Normal; small sample size (<math>n=9</math>) makes T-test risky. | |
| − | + | * Hypothesis: \(H_0\): The distributions of Location 1 and Location 2 are identical. <math>H_a</math>: The distributions differ (shift in location. | |
| − | + | '''Calculation (U Statistic):''' | |
| − | + | The logic relies on ranking all observations together (pooled). | |
| − | + | # Rank all <math>N = n_1 + n_2</math> observations. | |
| − | + | # Sum the ranks for Group 1 (<math>R_1</math>). | |
| − | * | + | # Calculate <math>U_1 = R_1 - \frac{n_1(n_1+1)}{2}</math>. |
| − | * | + | # Comparison: If <math>H_0</math> is true, the ranks should be intermixed evenly. If <math>H_a</math> is true, one group will have significantly higher ranks. |
| − | + | '''Comparison with T-Test:''' | |
| − | * | + | * WMW: Uses ranks. Robust to outliers and non-normality. Less power than T-test if data is actually Normal (approx 95% efficiency). |
| − | + | * T-Test: Uses raw values. Sensitive to outliers. Highest power for Normal data. | |
| − | + | * Guideline: If data is clearly non-Normal or sample size is small, use WMW. | |
| − | < | + | '''R Calculations:''' |
| − | + | <pre> | |
| − | + | loc1 <- c(8.1,7.89,8,7.85,8.01,7.82,7.99,7.8,7.93) | |
| − | + | loc2 <- c(7.85,7.3,7.73,7.27,7.58,7.27,7.5,7.23,7.41) | |
| − | + | wilcox.test(loc1, loc2, paired=FALSE, alternative = "two.sided") | |
| − | + | # Output: p-value = 0.0009172 (Significant Difference) | |
| − | + | </pre> | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
====[[AP_Statistics_Curriculum_2007_NonParam_2PropIndep#General_McNemar_test_of_marginal_homogeneity_for_a_single_category|McNemar Test]]==== | ====[[AP_Statistics_Curriculum_2007_NonParam_2PropIndep#General_McNemar_test_of_marginal_homogeneity_for_a_single_category|McNemar Test]]==== | ||
| − | + | This test analyzes paired nominal data (e.g., Before/After binary outcomes). It specifically tests for marginal homogeneity. | |
| − | + | '''Structure (<math>2 \times 2</math> Contingency Table):''' | |
| − | |||
| − | |||
| − | |||
| − | < | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | </ | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
<center> | <center> | ||
{|class="wikitable" style="text-align:center; width:40%" border="1" | {|class="wikitable" style="text-align:center; width:40%" border="1" | ||
|- | |- | ||
| − | + | | ||colspan=2|Post-Treatment | |
| − | |- | ||
| − | |||
| − | |||
| − | |||
|- | |- | ||
| − | | | + | |Pre-Treatment||Positive (+)||Negative (-)||Total |
|- | |- | ||
| − | | | + | |Positive (+)||a (Concordant)||b (Discordant)||a+b |
|- | |- | ||
| − | | | + | |Negative (-)||c (Discordant)||d (Concordant)||c+d |
|} | |} | ||
</center> | </center> | ||
| − | + | * Insight: Cells <math>a</math> and <math>d</math> represent subjects who didn't change (Consistent). They provide no information about the ''direction'' of change. | |
| − | < | + | * Focus: We compare the discordant cells <math>b</math> (changed from + to -) and <math>c</math> (changed from - to +). |
| − | + | * Statistic: <math>\chi^2 = \frac{(b-c)^2}{b+c}</math>. Under <math>H_0</math> (no effect), this follows a Chi-square distribution with <math>df=1</math>. | |
| − | + | '''Extension: Collapsing Tables for Specific Categories''' | |
| − | + | Sometimes we have <math>K \times K</math> data (e.g., Evaluator ratings: Poor, Good, Excellent) but are only interested in one category (e.g., "Poor"). | |
| − | + | * We can "collapse" the table into a <math>2 \times 2</math> matrix: "Poor" vs "Not Poor" (Good + Excellent). | |
| − | + | * This allows us to use the standard McNemar test to see if the two evaluators disagree specifically on the classification of "Poor" subjects. Note: To test agreement across *all* categories simultaneously, the Stuart-Maxwell test is required (not detailed here). | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | </ | ||
| − | |||
| − | |||
| − | |||
| − | : | ||
| − | :: | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
====[[SOCR_EduMaterials_AnalysisActivities_KruskalWallis|Kruskal-Wallis Test]]==== | ====[[SOCR_EduMaterials_AnalysisActivities_KruskalWallis|Kruskal-Wallis Test]]==== | ||
| − | + | The Kruskal-Wallis test is the non-parametric generalization of the One-Way ANOVA. It compares medians across <math>k > 2</math> independent groups. | |
| − | + | '''Hypotheses:''' | |
| − | + | * <math>H_0</math>: All <math>k</math> population distributions are identical. | |
| − | < | + | * <math>H_a</math>: At least one population stochastically dominates another (locations differ). |
| − | { | + | '''Calculations:''' |
| − | + | If there are no ties, the test statistic <math>H</math> (or <math>T</math>) is: | |
| − | + | <math>H = \frac{12}{N(N+1)} \sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}} - 3(N+1)</math> | |
| − | + | where <math>R_i</math> is the sum of ranks for group <math>i</math>, and <math>N</math> is the total sample size. | |
| − | + | Correction for Ties: | |
| − | |- | + | When data values are repeated (ties), the statistic must be divided by a correction factor <math>C</math>: |
| − | | | + | <math>C = 1 - \frac{\sum (t^3 - t)}{N^3 - N}</math> |
| − | + | where <math>t</math> is the count of observations in each set of ties. The SOCR and R implementations automatically apply this correction. | |
| − | + | '''Motivational Example:''' Four different teaching methods are applied to students. | |
| − | + | * Data: | |
| − | + | Method 1: 65, 87, 73, 79 | |
| − | + | Method 2: 75, 69, 83, 81 | |
| − | + | Method 3: 59, 78, 67, 62 | |
| − | + | Method 4: 94, 89, 80, 88 | |
| − | </ | + | * Intuition: Method 3 looks lower and Method 4 looks higher. ANOVA might be biased by the outlier "59" in a small sample. Kruskal-Wallis ranks these values, mitigating the outlier's leverage. |
| + | '''R Calculations:''' | ||
| + | <pre> | ||
| + | # Best Practice: Use a Data Frame | ||
| + | df_teaching <- data.frame( | ||
| + | Score = c(65, 87, 73, 79, # Method 1 | ||
| + | 75, 69, 83, 81, # Method 2 | ||
| + | 59, 78, 67, 62, # Method 3 | ||
| + | 94, 89, 80, 88), # Method 4 | ||
| + | Method = factor(rep(1:4, each=4)) | ||
| + | ) | ||
| + | kruskal.test(Score ~ Method, data = df_teaching) | ||
| + | </pre> | ||
| + | ====[[AP_Statistics_Curriculum_2007_NonParam_VarIndep|Fligner-Killeen Test]] (Variance Homogeneity)==== | ||
| + | Parametric tests like ANOVA assume Homogeneity of Variances (Homoscedasticity). The [[AP_Statistics_Curriculum_2007_NonParam_VarIndep|Fligner-Killeen test]] is a robust, nonparametric way to check this assumption across <math>k</math> groups. | ||
| + | * Logic: It tests if the dispersion (spread) of observations around the median is the same for all groups. | ||
| + | * Mechanism: It ranks the absolute differences from the median <math>|X_{ij} - Median_j|</math> and assigns weights based on Normal distribution quantiles. | ||
| + | * Statistic: The test statistic approximates a <math>\chi^2</math> distribution with <math>k-1</math> degrees of freedom. | ||
| + | ===Applications & Problems=== | ||
| + | '''Problem 6.1: Maze Escape Times''' | ||
| + | Rats timed before and after 2 weeks of training. | ||
| + | * Data: Paired (Before/After). | ||
| + | * Issue: Two rats failed to complete (marked "N"). These are effectively "infinite" times, or missing data. If we assume "N" > any measured time, we can still assign a Sign (+ or -). | ||
| + | * Analysis: | ||
| + | Rat 3: N vs 45 (Improved, +) | ||
| + | Rat 10: N vs 50 (Improved, +) | ||
| + | Total "+" signs: 9. Total "-" signs: 1. Total valid <math>n=10</math>. | ||
| + | Check significance using Sign Test table or Binomial calculation for 9/10 successes. | ||
| − | + | '''Problem 6.2: Brain Volume Segmentation''' | |
| − | * | + | Comparing two algorithms for brain segmentation across 57 regions (ROI). |
| + | * Visual Check: | ||
| + | * Question: Are the algorithms consistent? | ||
| + | * Action: Compute differences (<math>Vol_1 - Vol_2</math>). Plot histogram of differences. If symmetric, use Wilcoxon Signed Rank. If skewed, use Sign Test. | ||
| + | * Hint: With <math>n=57</math>, the Power of the Wilcoxon test is quite high. | ||
| − | + | ===References=== | |
| − | : | + | *[https://sda.statisticalcomputing.org/learning use the SOCR SDA app to complete the Non-parametric test Learning Module] |
| − | + | <hr> | |
| − | + | * SOCR Home page: https://socr.umich.edu | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | * | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | {{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SMHS_NonParamInference}} | |
Latest revision as of 17:24, 10 February 2026
Contents
Scientific Methods for Health Sciences - Non-Parametric Inference
Overview
Nonparametric inference involves descriptive and inferential statistics that are not based on parametrized families of probability distributions (like the Normal or Poisson distributions). Unlike parametric inference, which assumes the data follows a specific distribution structure defined a priori, nonparametric inference is "distribution-free". The term "nonparametric" does not mean the models lack parameters entirely. Rather, it implies that the number and nature of the parameters are flexible and determined from the data rather than fixed in advance. In this lecture, we introduce the area of nonparametric inference and illustrate various applications with examples.
Motivation
We have previously discussed parametric inference, where conclusions are drawn based on assumptions about the population's probability distribution.
- Question: What if these assumptions (e.g., Normality) are violated? What if the variables cannot be categorized into a known parametrized family?
- Answer: Distribution-free (nonparametric) statistical methods are the solution.
Motivational Clinical Example
Consider 16 studies of septic patients reporting the relative risk of mortality associated with acute renal failure. A relative risk (RR) of 1.0 implies no effect, while \(RR \ne 1\) suggests a beneficial or detrimental effect. The goal is to determine if developing acute renal failure impacts mortality based on this cumulative evidence. [Image of boxplot showing skewed distribution] The data is heavily skewed and not bell-shaped, making the traditional paired t-test inappropriate.
| Study | Relative Risk | Sign (Relative Risk - 1) |
| 1 | 0.75 | - |
| 2 | 2.03 | + |
| 3 | 2.29 | + |
| 4 | 2.11 | + |
| 5 | 0.80 | - |
| 6 | 1.50 | + |
| 7 | 0.79 | - |
| 8 | 1.01 | + |
| 9 | 1.23 | + |
| 10 | 1.48 | + |
| 11 | 2.45 | + |
| 12 | 1.02 | + |
| 13 | 1.03 | + |
| 14 | 1.30 | + |
| 15 | 1.54 | + |
| 16 | 1.27 | + |
Theory
The Sign Test
The Sign Test is the simplest nonparametric alternative to the One-Sample and Paired T-Test. It does not require the data to be normally distributed. It relies solely on the direction (sign) of the difference between an observation and a hypothesized value (usually the median). Concept: If there is no effect (Null Hypothesis \(H_0\)), positive and negative deviations from the median should be equally likely, similar to a coin toss (\(P=0.5\)).
- Application: In our sepsis example, if renal failure had no effect, we would expect about half the relative risks to be \(>1\) (\(+\)) and half to be \(<1\) (\(-\)).
- Observation: We observe 3 studies with "\(-\)" and 13 studies with "\(+\)".
- Intuition Check: Is a 13 vs 3 split likely to happen by fair coin flips alone? Intuitively, this difference appears too large to be random variation.
Formal Calculations: Let \(N_{+}\) be the number of "+" signs. We test at significance level \(\alpha=0.05\).
- \(H_{0}: Median = 1\) (Implies \(P(+) = 0.5\))
- \(H_{a}: Median \ne 1\) (Implies \(P(+) \ne 0.5\))
Test Statistic\[B_{S} = \max(N_{+}, N_{-})\]. Here, \(B_S = \max(13, 3) = 13\). We calculate the probability of observing 13 or more successes in 16 trials under the null hypothesis (\(p=0.5\))\[P(B_{S} \ge 13 | B_{S} \sim Bin(16,0.5)) = 0.0106\]
- Verify this using the Distributome Binomial Calculator.
Critical Note on P-Values: The value \(0.0106\) is the one-sided probability. Since our hypothesis is two-sided (\(\ne\)), the p-value is \(2 \times 0.0106 = 0.0212\). Since \(0.0212 < 0.05\), we reject \(H_0\) and conclude that acute renal failure has a significant effect on mortality. Example 2: Twin Aggressiveness 12 pairs of identical twins are tested to see if the firstborn is more aggressive. Data is paired naturally by twinship. Note: One pair has equal scores (Tie). In the Sign Test, ties are typically discarded, reducing \(n\).
| Twin-Index | 1st Born | 2nd Born | Sign |
| ... | ... | ... | ... |
| 6 | 72 | 72 | 0 (Drop) |
| ... | ... | ... | ... |
- (See full table in previous section)*
Using SOCR Sign Test Analysis:
- P-value = 0.274.
- Conclusion: We cannot reject the null hypothesis at the 5% level. There is no strong evidence of a birth-order effect.
The Wilcoxon Signed Rank Test
The Sign Test ignores the magnitude of differences, utilizing only their direction. The Wilcoxon Signed Rank Test improves power by incorporating the ranks of the absolute differences. Assumptions:
- Data are paired and drawn independently.
- The dependent variable is continuous (interval scale).
- The distribution of differences is symmetric (though not necessarily Normal).
Motivational Example: Central venous oxygen saturation (\(SvO_{2}\)) in 10 patients at admission vs. 6 hours later.
- \(H_{0}\): The median difference is zero.
Procedure:
- Calculate difference \(D_i = X_i - Y_i\).
- Discard pairs where \(D_i = 0\).
- Rank the absolute differences \(|D_i|\).
- Assign the original sign of \(D_i\) to the rank.
| Patient | Diff (X-Y) | Abs Diff | Rank | Signed Rank |
|---|---|---|---|---|
| 10 | 5.5 | 5.5 | 4 | 4 |
| 2 | 2.4 | 2.4 | 1 | 1 |
| 7 | -2.5 | 2.5 | 2 | -2 |
| 9 | -3.5 | 3.5 | 3 | -3 |
| 3 | -5.8 | 5.8 | 5 | -5 |
| 5 | -7.1 | 7.1 | 6 | -6 |
| 6 | -12.2 | 12.2 | 7 | -7 |
| 1 | -13.2 | 13.2 | 8 | -8 |
| 4 | -13.7 | 13.7 | 9 | -9 |
| 8 | -17.7 | 17.7 | 10 | -10 |
| Sum | W = -45 |
Test Statistics & Variance Correction: There are two common ways to report the statistic:
- \(W_{net}\) (Sum of signed ranks): In the table above, \(W_{net} = -45\). Expected value \(E(W_{net}) = 0\). Variance \(Var(W_{net}) = \frac{n(n+1)(2n+1)}{6}\).
- \(W_{+}\) (Sum of positive ranks): Often used by software (like R). \(W_{+} = 4+1 = 5\). Expected value \(E(W_+) = \frac{n(n+1)}{4}\). Variance \(Var(W_+) = \frac{n(n+1)(2n+1)}{24}\).
Results Interpretation (from SOCR/R):
- Wilcoxon Statistic (\(W_+\)) = 5.000
- Expected Value (\(E(W_+)\)) = 27.500
- Variance (\(Var(W_+)\)) = 96.250
- Z-Score = \(\frac{5 - 27.5}{\sqrt{96.25}} \approx -2.293\)
- Two-Sided P-Value = 0.022
- Conclusion: Reject \(H_0\). There is a significant difference in oxygen saturation after 6 hours.
R Calculations:
# Modern R Syntax using Data Frame
df_ox <- data.frame(
Admission = c(65.3,59.1,58.2,56,56.1,60.6,37.8,39.7,57.7,33.6),
Later_6h = c(59.8,56.7,60.7,59.5,61.9,67.7,50,52.9,71.4,51.3)
)
# Perform Test
wilcox.test(df_ox$Admission, df_ox$Later_6h,
paired=TRUE, alternative = "two.sided")
Wilcoxon-Mann-Whitney (WMW) Test
Also known as the Mann-Whitney U test, this is the nonparametric analogue to the Independent Samples T-test. It assesses whether two independent samples come from the same distribution. Motivational Example: Soil pH levels at two different locations (Location 1 vs. Location 2).
- Assumption Check: Histograms show data is not Normal; small sample size (\(n=9\)) makes T-test risky.
- Hypothesis: \(H_0\): The distributions of Location 1 and Location 2 are identical. \(H_a\): The distributions differ (shift in location.
Calculation (U Statistic): The logic relies on ranking all observations together (pooled).
- Rank all \(N = n_1 + n_2\) observations.
- Sum the ranks for Group 1 (\(R_1\)).
- Calculate \(U_1 = R_1 - \frac{n_1(n_1+1)}{2}\).
- Comparison: If \(H_0\) is true, the ranks should be intermixed evenly. If \(H_a\) is true, one group will have significantly higher ranks.
Comparison with T-Test:
- WMW: Uses ranks. Robust to outliers and non-normality. Less power than T-test if data is actually Normal (approx 95% efficiency).
- T-Test: Uses raw values. Sensitive to outliers. Highest power for Normal data.
- Guideline: If data is clearly non-Normal or sample size is small, use WMW.
R Calculations:
loc1 <- c(8.1,7.89,8,7.85,8.01,7.82,7.99,7.8,7.93) loc2 <- c(7.85,7.3,7.73,7.27,7.58,7.27,7.5,7.23,7.41) wilcox.test(loc1, loc2, paired=FALSE, alternative = "two.sided") # Output: p-value = 0.0009172 (Significant Difference)
McNemar Test
This test analyzes paired nominal data (e.g., Before/After binary outcomes). It specifically tests for marginal homogeneity. Structure (\(2 \times 2\) Contingency Table):
| Post-Treatment | |||
| Pre-Treatment | Positive (+) | Negative (-) | Total |
| Positive (+) | a (Concordant) | b (Discordant) | a+b |
| Negative (-) | c (Discordant) | d (Concordant) | c+d |
- Insight: Cells \(a\) and \(d\) represent subjects who didn't change (Consistent). They provide no information about the direction of change.
- Focus: We compare the discordant cells \(b\) (changed from + to -) and \(c\) (changed from - to +).
- Statistic\[\chi^2 = \frac{(b-c)^2}{b+c}\]. Under \(H_0\) (no effect), this follows a Chi-square distribution with \(df=1\).
Extension: Collapsing Tables for Specific Categories Sometimes we have \(K \times K\) data (e.g., Evaluator ratings: Poor, Good, Excellent) but are only interested in one category (e.g., "Poor").
- We can "collapse" the table into a \(2 \times 2\) matrix: "Poor" vs "Not Poor" (Good + Excellent).
- This allows us to use the standard McNemar test to see if the two evaluators disagree specifically on the classification of "Poor" subjects. Note: To test agreement across *all* categories simultaneously, the Stuart-Maxwell test is required (not detailed here).
Kruskal-Wallis Test
The Kruskal-Wallis test is the non-parametric generalization of the One-Way ANOVA. It compares medians across \(k > 2\) independent groups. Hypotheses:
- \(H_0\): All \(k\) population distributions are identical.
- \(H_a\): At least one population stochastically dominates another (locations differ).
Calculations: If there are no ties, the test statistic \(H\) (or \(T\)) is\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}} - 3(N+1)\] where \(R_i\) is the sum of ranks for group \(i\), and \(N\) is the total sample size. Correction for Ties: When data values are repeated (ties), the statistic must be divided by a correction factor \(C\)\[C = 1 - \frac{\sum (t^3 - t)}{N^3 - N}\] where \(t\) is the count of observations in each set of ties. The SOCR and R implementations automatically apply this correction. Motivational Example: Four different teaching methods are applied to students.
- Data:
Method 1: 65, 87, 73, 79 Method 2: 75, 69, 83, 81 Method 3: 59, 78, 67, 62 Method 4: 94, 89, 80, 88
- Intuition: Method 3 looks lower and Method 4 looks higher. ANOVA might be biased by the outlier "59" in a small sample. Kruskal-Wallis ranks these values, mitigating the outlier's leverage.
R Calculations:
# Best Practice: Use a Data Frame
df_teaching <- data.frame(
Score = c(65, 87, 73, 79, # Method 1
75, 69, 83, 81, # Method 2
59, 78, 67, 62, # Method 3
94, 89, 80, 88), # Method 4
Method = factor(rep(1:4, each=4))
)
kruskal.test(Score ~ Method, data = df_teaching)
Fligner-Killeen Test (Variance Homogeneity)
Parametric tests like ANOVA assume Homogeneity of Variances (Homoscedasticity). The Fligner-Killeen test is a robust, nonparametric way to check this assumption across \(k\) groups.
- Logic: It tests if the dispersion (spread) of observations around the median is the same for all groups.
- Mechanism: It ranks the absolute differences from the median \(|X_{ij} - Median_j|\) and assigns weights based on Normal distribution quantiles.
- Statistic: The test statistic approximates a \(\chi^2\) distribution with \(k-1\) degrees of freedom.
Applications & Problems
Problem 6.1: Maze Escape Times Rats timed before and after 2 weeks of training.
- Data: Paired (Before/After).
- Issue: Two rats failed to complete (marked "N"). These are effectively "infinite" times, or missing data. If we assume "N" > any measured time, we can still assign a Sign (+ or -).
- Analysis:
Rat 3: N vs 45 (Improved, +) Rat 10: N vs 50 (Improved, +) Total "+" signs: 9. Total "-" signs: 1. Total valid \(n=10\). Check significance using Sign Test table or Binomial calculation for 9/10 successes.
Problem 6.2: Brain Volume Segmentation Comparing two algorithms for brain segmentation across 57 regions (ROI).
- Visual Check:
- Question: Are the algorithms consistent?
- Action: Compute differences (\(Vol_1 - Vol_2\)). Plot histogram of differences. If symmetric, use Wilcoxon Signed Rank. If skewed, use Sign Test.
- Hint: With \(n=57\), the Power of the Wilcoxon test is quite high.
References
- SOCR Home page: https://socr.umich.edu
Translate this page: