SMHS NonParamInference

Scientific Methods for Health Sciences - Non-Parametric Inference

Overview

Nonparametric inference involves descriptive and inferential statistics that are not based on parametrized families of probability distributions (like the Normal or Poisson distributions). Unlike parametric inference, which assumes the data follows a specific distribution structure defined a priori, nonparametric inference is "distribution-free". The term "nonparametric" does not mean the models lack parameters entirely. Rather, it implies that the number and nature of the parameters are flexible and determined from the data rather than fixed in advance. In this lecture, we introduce the area of nonparametric inference and illustrate various applications with examples.

Motivation

We have previously discussed parametric inference, where conclusions are drawn based on assumptions about the population's probability distribution.

Question: What if these assumptions (e.g., Normality) are violated? What if the variables cannot be categorized into a known parametrized family?
Answer: Distribution-free (nonparametric) statistical methods are the solution.

Motivational Clinical Example

Consider 16 studies of septic patients reporting the relative risk of mortality associated with acute renal failure. A relative risk (RR) of 1.0 implies no effect, while \(RR \ne 1\) suggests a beneficial or detrimental effect. The goal is to determine if developing acute renal failure impacts mortality based on this cumulative evidence. [Image of boxplot showing skewed distribution] The data is heavily skewed and not bell-shaped, making the traditional paired t-test inappropriate.

Study	Relative Risk	Sign (Relative Risk - 1)
1	0.75	-
2	2.03	+
3	2.29	+
4	2.11	+
5	0.80	-
6	1.50	+
7	0.79	-
8	1.01	+
9	1.23	+
10	1.48	+
11	2.45	+
12	1.02	+
13	1.03	+
14	1.30	+
15	1.54	+
16	1.27	+

Theory

The Sign Test

The Sign Test is the simplest nonparametric alternative to the One-Sample and Paired T-Test. It does not require the data to be normally distributed. It relies solely on the direction (sign) of the difference between an observation and a hypothesized value (usually the median). Concept: If there is no effect (Null Hypothesis \(H_0\)), positive and negative deviations from the median should be equally likely, similar to a coin toss (\(P=0.5\)).

Application: In our sepsis example, if renal failure had no effect, we would expect about half the relative risks to be \(>1\) (\(+\)) and half to be \(<1\) (\(-\)).
Observation: We observe 3 studies with "\(-\)" and 13 studies with "\(+\)".
Intuition Check: Is a 13 vs 3 split likely to happen by fair coin flips alone? Intuitively, this difference appears too large to be random variation.

Formal Calculations: Let \(N_{+}\) be the number of "+" signs. We test at significance level \(\alpha=0.05\).

\(H_{0}: Median = 1\) (Implies \(P(+) = 0.5\))
\(H_{a}: Median \ne 1\) (Implies \(P(+) \ne 0.5\))

Test Statistic\[B_{S} = \max(N_{+}, N_{-})\]. Here, \(B_S = \max(13, 3) = 13\). We calculate the probability of observing 13 or more successes in 16 trials under the null hypothesis (\(p=0.5\))\[P(B_{S} \ge 13 | B_{S} \sim Bin(16,0.5)) = 0.0106\]

Verify this using the Distributome Binomial Calculator.

Critical Note on P-Values: The value \(0.0106\) is the one-sided probability. Since our hypothesis is two-sided (\(\ne\)), the p-value is \(2 \times 0.0106 = 0.0212\). Since \(0.0212 < 0.05\), we reject \(H_0\) and conclude that acute renal failure has a significant effect on mortality. Example 2: Twin Aggressiveness 12 pairs of identical twins are tested to see if the firstborn is more aggressive. Data is paired naturally by twinship. Note: One pair has equal scores (Tie). In the Sign Test, ties are typically discarded, reducing \(n\).

Twin-Index	1st Born	2nd Born	Sign
...	...	...	...
6	72	72	0 (Drop)
...	...	...	...

(See full table in previous section)*

Using SOCR Sign Test Analysis:

P-value = 0.274.
Conclusion: We cannot reject the null hypothesis at the 5% level. There is no strong evidence of a birth-order effect.

The Wilcoxon Signed Rank Test

The Sign Test ignores the magnitude of differences, utilizing only their direction. The Wilcoxon Signed Rank Test improves power by incorporating the ranks of the absolute differences. Assumptions:

Data are paired and drawn independently.
The dependent variable is continuous (interval scale).
The distribution of differences is symmetric (though not necessarily Normal).

Motivational Example: Central venous oxygen saturation (\(SvO_{2}\)) in 10 patients at admission vs. 6 hours later.

\(H_{0}\): The median difference is zero.

Procedure:

Calculate difference \(D_i = X_i - Y_i\).
Discard pairs where \(D_i = 0\).
Rank the absolute differences \(|D_i|\).
Assign the original sign of \(D_i\) to the rank.

Patient	Diff (X-Y)	Abs Diff	Rank	Signed Rank
10	5.5	5.5	4	4
2	2.4	2.4	1	1
7	-2.5	2.5	2	-2
9	-3.5	3.5	3	-3
3	-5.8	5.8	5	-5
5	-7.1	7.1	6	-6
6	-12.2	12.2	7	-7
1	-13.2	13.2	8	-8
4	-13.7	13.7	9	-9
8	-17.7	17.7	10	-10
Sum				W = -45

Test Statistics & Variance Correction: There are two common ways to report the statistic:

\(W_{net}\) (Sum of signed ranks): In the table above, \(W_{net} = -45\). Expected value \(E(W_{net}) = 0\). Variance \(Var(W_{net}) = \frac{n(n+1)(2n+1)}{6}\).
\(W_{+}\) (Sum of positive ranks): Often used by software (like R). \(W_{+} = 4+1 = 5\). Expected value \(E(W_+) = \frac{n(n+1)}{4}\). Variance \(Var(W_+) = \frac{n(n+1)(2n+1)}{24}\).

Results Interpretation (from SOCR/R):

Wilcoxon Statistic (\(W_+\)) = 5.000

Expected Value (\(E(W_+)\)) = 27.500

Variance (\(Var(W_+)\)) = 96.250

Z-Score = \(\frac{5 - 27.5}{\sqrt{96.25}} \approx -2.293\)

Two-Sided P-Value = 0.022

Conclusion: Reject \(H_0\). There is a significant difference in oxygen saturation after 6 hours.

R Calculations:

# Modern R Syntax using Data Frame
df_ox <- data.frame(
  Admission = c(65.3,59.1,58.2,56,56.1,60.6,37.8,39.7,57.7,33.6),
  Later_6h  = c(59.8,56.7,60.7,59.5,61.9,67.7,50,52.9,71.4,51.3)
)
# Perform Test
wilcox.test(df_ox$Admission, df_ox$Later_6h, 
            paired=TRUE, alternative = "two.sided")

Wilcoxon-Mann-Whitney (WMW) Test

Also known as the Mann-Whitney U test, this is the nonparametric analogue to the Independent Samples T-test. It assesses whether two independent samples come from the same distribution. Motivational Example: Soil pH levels at two different locations (Location 1 vs. Location 2).

Assumption Check: Histograms show data is not Normal; small sample size (\(n=9\)) makes T-test risky.
Hypothesis: \(H_0\): The distributions of Location 1 and Location 2 are identical. \(H_a\): The distributions differ (shift in location.

Calculation (U Statistic): The logic relies on ranking all observations together (pooled).

Rank all \(N = n_1 + n_2\) observations.
Sum the ranks for Group 1 (\(R_1\)).
Calculate \(U_1 = R_1 - \frac{n_1(n_1+1)}{2}\).
Comparison: If \(H_0\) is true, the ranks should be intermixed evenly. If \(H_a\) is true, one group will have significantly higher ranks.

Comparison with T-Test:

WMW: Uses ranks. Robust to outliers and non-normality. Less power than T-test if data is actually Normal (approx 95% efficiency).
T-Test: Uses raw values. Sensitive to outliers. Highest power for Normal data.
Guideline: If data is clearly non-Normal or sample size is small, use WMW.

R Calculations:

loc1 <- c(8.1,7.89,8,7.85,8.01,7.82,7.99,7.8,7.93)
loc2 <- c(7.85,7.3,7.73,7.27,7.58,7.27,7.5,7.23,7.41) 
wilcox.test(loc1, loc2, paired=FALSE, alternative = "two.sided")
# Output: p-value = 0.0009172 (Significant Difference)

McNemar Test

This test analyzes paired nominal data (e.g., Before/After binary outcomes). It specifically tests for marginal homogeneity. Structure (\(2 \times 2\) Contingency Table):

	Post-Treatment
Pre-Treatment	Positive (+)	Negative (-)	Total
Positive (+)	a (Concordant)	b (Discordant)	a+b
Negative (-)	c (Discordant)	d (Concordant)	c+d

Insight: Cells \(a\) and \(d\) represent subjects who didn't change (Consistent). They provide no information about the direction of change.
Focus: We compare the discordant cells \(b\) (changed from + to -) and \(c\) (changed from - to +).
Statistic\[\chi^2 = \frac{(b-c)^2}{b+c}\]. Under \(H_0\) (no effect), this follows a Chi-square distribution with \(df=1\).

Extension: Collapsing Tables for Specific Categories Sometimes we have \(K \times K\) data (e.g., Evaluator ratings: Poor, Good, Excellent) but are only interested in one category (e.g., "Poor").

We can "collapse" the table into a \(2 \times 2\) matrix: "Poor" vs "Not Poor" (Good + Excellent).
This allows us to use the standard McNemar test to see if the two evaluators disagree specifically on the classification of "Poor" subjects. Note: To test agreement across *all* categories simultaneously, the Stuart-Maxwell test is required (not detailed here).

Kruskal-Wallis Test

The Kruskal-Wallis test is the non-parametric generalization of the One-Way ANOVA. It compares medians across \(k > 2\) independent groups. Hypotheses:

\(H_0\): All \(k\) population distributions are identical.
\(H_a\): At least one population stochastically dominates another (locations differ).

Calculations: If there are no ties, the test statistic \(H\) (or \(T\)) is\[H = \frac{12}{N(N+1)} \sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}} - 3(N+1)\] where \(R_i\) is the sum of ranks for group \(i\), and \(N\) is the total sample size. Correction for Ties: When data values are repeated (ties), the statistic must be divided by a correction factor \(C\)\[C = 1 - \frac{\sum (t^3 - t)}{N^3 - N}\] where \(t\) is the count of observations in each set of ties. The SOCR and R implementations automatically apply this correction. Motivational Example: Four different teaching methods are applied to students.

Data:

Method 1: 65, 87, 73, 79
Method 2: 75, 69, 83, 81
Method 3: 59, 78, 67, 62
Method 4: 94, 89, 80, 88

Intuition: Method 3 looks lower and Method 4 looks higher. ANOVA might be biased by the outlier "59" in a small sample. Kruskal-Wallis ranks these values, mitigating the outlier's leverage.

R Calculations:

# Best Practice: Use a Data Frame
df_teaching <- data.frame(
  Score = c(65, 87, 73, 79,           # Method 1
            75, 69, 83, 81,           # Method 2
            59, 78, 67, 62,           # Method 3
            94, 89, 80, 88),          # Method 4
  Method = factor(rep(1:4, each=4))
)
kruskal.test(Score ~ Method, data = df_teaching)

Fligner-Killeen Test (Variance Homogeneity)

Parametric tests like ANOVA assume Homogeneity of Variances (Homoscedasticity). The Fligner-Killeen test is a robust, nonparametric way to check this assumption across \(k\) groups.

Logic: It tests if the dispersion (spread) of observations around the median is the same for all groups.
Mechanism: It ranks the absolute differences from the median \(|X_{ij} - Median_j|\) and assigns weights based on Normal distribution quantiles.
Statistic: The test statistic approximates a \(\chi^2\) distribution with \(k-1\) degrees of freedom.

Applications & Problems

Problem 6.1: Maze Escape Times Rats timed before and after 2 weeks of training.

Data: Paired (Before/After).
Issue: Two rats failed to complete (marked "N"). These are effectively "infinite" times, or missing data. If we assume "N" > any measured time, we can still assign a Sign (+ or -).
Analysis:

Rat 3: N vs 45 (Improved, +)
Rat 10: N vs 50 (Improved, +)
Total "+" signs: 9. Total "-" signs: 1. Total valid \(n=10\).
Check significance using Sign Test table or Binomial calculation for 9/10 successes.

Problem 6.2: Brain Volume Segmentation Comparing two algorithms for brain segmentation across 57 regions (ROI).

Visual Check:
Question: Are the algorithms consistent?
Action: Compute differences (\(Vol_1 - Vol_2\)). Plot histogram of differences. If symmetric, use Wilcoxon Signed Rank. If skewed, use Sign Test.
Hint: With \(n=57\), the Power of the Wilcoxon test is quite high.

References

use the SOCR SDA app to complete the Non-parametric test Learning Module

SOCR Home page: https://socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

SMHS NonParamInference

Contents

Scientific Methods for Health Sciences - Non-Parametric Inference

Overview

Motivation

Motivational Clinical Example

Theory

The Sign Test

The Wilcoxon Signed Rank Test

Wilcoxon-Mann-Whitney (WMW) Test

McNemar Test

Kruskal-Wallis Test

Fligner-Killeen Test (Variance Homogeneity)

Applications & Problems

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools