Difference between revisions of "SMHS ROC"

Revision as of 09:34, 23 February 2026

Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve

Overview

The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.

By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.

Motivation

In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While sensitivity and specificity describe accuracy at a single threshold, classifier performance changes as this threshold shifts.

Key objectives of ROC analysis include:

Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (\(1 - \text{FPR}\)).
Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.

Theory

The Confusion Matrix and Core Metrics

A binary classifier produces four possible outcomes based on a decision threshold:

		True Condition (Gold Standard)
		Disease (Positive)	No Disease (Negative)
Test Result	Positive	True Positive (TP) (Hit)	False Positive (FP) (Type I Error, \(\alpha\))
	Negative	False Negative (FN) (Type II Error, \(\beta\))	True Negative (TN) (Correct Rejection)

Fundamental metrics derived from this matrix:

Sensitivity (True Positive Rate)\[\text{Sensitivity} = \frac{TP}{TP + FN}\]
Specificity (True Negative Rate)\[\text{Specificity} = \frac{TN}{TN + FP}\]
False Positive Rate\[FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}\]

Constructing the ROC Curve

To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.

Example: Hypothyroidism Diagnosis Using T4 Levels The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.

Frequency of T4 Levels in Hypothyroid vs. Euthyroid Patients
Group	<1	1–2	2–3	3–4	4–5	5–6	6–7	7–8	8–9	9–10	10–11	>11
Hypothyroid	2	3	1	8	4	4	3	3	1	0	2	1
Euthyroid	0	0	0	0	1	6	11	19	17	20	11	8

We compute performance metrics at different thresholds:

Cut-point = 5 (strict):

 T4 ≤ 5 → test positive  
 \(\text{Sensitivity} = 18/32 = 0.56\),  
 \(\text{Specificity} = 92/93 = 0.99\)

Cut-point = 7 (moderate)\[\text{Sensitivity} = 25/32 = 0.78\],

 \(\text{Specificity} = 75/93 = 0.81\)

Cut-point = 9 (lenient)\[\text{Sensitivity} = 29/32 = 0.91\],

 \(\text{Specificity} = 39/93 = 0.42\)

Plotting Sensitivity (y-axis) versus FPR = \(1 - \text{Specificity}\) (x-axis) for all thresholds yields the ROC curve.

Applications and Interpretation

Area Under the Curve (AUC)

The AUC provides a standardized measure of overall diagnostic accuracy:

0.90–1.00: Excellent
0.80–0.90: Good
0.70–0.80: Fair
0.60–0.70: Poor
0.50–0.60: Fail (no better than chance)

In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.

Optimal Threshold Selection and Cost Analysis

While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).

The slope method identifies the optimal point on the ROC curve where the tangent slope equals\[\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}\].

If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is \(1/8\).
If treatment risks make FP 2× more costly, the target slope is 2.

This approach balances clinical priorities with statistical performance.

Practical Implementation in R

Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.

Example 1: Basic ROC Analysis for T4 Test

# Install and load required package
if (!require("pROC")) install.packages("pROC")
library(pROC)

# Simulated data based on T4 distribution
response <- c(rep(1, 32), rep(0, 93))  # 1 = Hypothyroid, 0 = Euthyroid

# Simulated T4 values (lower = more likely diseased)
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
predictor <- c(predictor_pos, predictor_neg)

# Build ROC object (higher predictor = less likely diseased)
roc_obj <- roc(response, predictor, direction = ">")

# Plot ROC curve with AUC
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)

# Identify optimal threshold (Youden index)
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))

Example 2: Comparing Classifiers for Alzheimer’s Disease

This example compares Random Forest and Logistic Regression models using Global Gray Matter Volume (GMV) and demographic features.

# Load required libraries
if (!require("randomForest")) install.packages("randomForest")
if (!require("ROCR")) install.packages("ROCR")
if (!require("pROC")) install.packages("pROC")
if (!require("caret")) install.packages("caret")
library(randomForest)
library(ROCR)
library(pROC)
library(caret)
library("XML"); library("xml2"); library("rvest")

# Load data
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])

# Clean and preprocess
dataset_clean <- subset(dataset, Group %in% c("AD", "NC"))
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD"))
# Convert to binary: 1 for AD (positive class), 0 for NC
dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0)

# Ensure predictors are correct type
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
dataset_clean$Age <- as.numeric(dataset_clean$Age)
dataset_clean$Sex <- as.factor(dataset_clean$Sex)

# Remove NAs
dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")])

# Check class balance
cat("Class distribution:\n")
print(table(dataset_clean$Group_binary))
cat("\nTotal observations:", nrow(dataset_clean), "\n")

# Set seed for reproducibility
set.seed(123)

# Split data into training and testing (70/30)
train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean))
train_data <- dataset_clean[train_idx, ]
test_data <- dataset_clean[-train_idx, ]

# Random Forest model (with all features including Sex)
rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex, 
                         data = train_data, 
                         ntree = 500,
                         importance = TRUE)

# Logistic regression model (with all features)
glm_model <- glm(Group_binary ~ GMV + Age + Sex, 
                 data = train_data, 
                 family = binomial(link = "logit"))

# Get predictions on TEST data (not training data)
rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"]  # Probability of class 1 (AD)
glm_probs <- predict(glm_model, test_data, type = "response")

# Check prediction distributions
cat("\nPrediction summary:\n")
cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n")
cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n")

# ROC curves using ROCR package (corrected)
pred_rf <- prediction(rf_probs, test_data$Group_binary)
perf_rf <- performance(pred_rf, "tpr", "fpr")
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])

pred_glm <- prediction(glm_probs, test_data$Group_binary)
perf_glm <- performance(pred_glm, "tpr", "fpr")
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])

# Alternative: Use pROC for better diagnostics and plotting
roc_rf <- roc(test_data$Group_binary, rf_probs)
roc_glm <- roc(test_data$Group_binary, glm_probs)

# Comparative plot with BOTH ROCR and pROC approaches
par(mfrow = c(1, 2))

# Plot 1: Using ROCR
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Plot 2: Using pROC (often more robust)
plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)")
lines(roc_glm, col = "red", lwd = 2)
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Reset plot layout
par(mfrow = c(1, 1))

# Print model performance metrics
cat("\n=== Model Performance ===\n")
cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n")
cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n")
cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n")
cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n")

# If AUC is still below 0.5, flip predictions
if (auc_rf < 0.5) {
  cat("\nNote: RF AUC < 0.5. Flipping predictions...\n")
  rf_probs <- 1 - rf_probs
  pred_rf <- prediction(rf_probs, test_data$Group_binary)
  auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
  cat("Corrected RF AUC:", round(auc_rf, 3), "\n")
}

if (auc_glm < 0.5) {
  cat("Note: GLM AUC < 0.5. Flipping predictions...\n")
  glm_probs <- 1 - glm_probs
  pred_glm <- prediction(glm_probs, test_data$Group_binary)
  auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
  cat("Corrected GLM AUC:", round(auc_glm, 3), "\n")
}

# Final improved plot with corrected AUCs
plot(perf_rf, col = "blue", lwd = 2, 
     main = "ROC Comparison: Random Forest vs Logistic Regression",
     xlab = "False Positive Rate", 
     ylab = "True Positive Rate")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")

# Fixed Legend
legend("bottomright", 
       legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"),
                  paste("GLM (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), 
       lwd = 2, 
       cex = 0.75,          # Reduced size (0.9 was likely too big)
       bty = "n",           # "n" removes the box entirely for a cleaner look
       inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes
       y.intersp = 0.8)     # Tightens the vertical spacing between lines

Problems

Problem 6.1: ROC Construction

A study evaluates a biomarker for distinguishing lung cancer subtypes:

Biomarker Range	Type A (Positive)	Type B (Negative)
< 2	3	4
2–4	6	2
4–6	15	7
6–8	7	33
> 8	1	38

Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
Compute the AUC and interpret whether the test is clinically useful.

Problem 6.2: Clinical Application

True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.

Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.

Problem 6.3: Definitions

The Positive Predictive Value (PPV) is calculated as:

(a) True Positives / Total Population
(b) True Negatives / (True Negatives + False Positives)
(c) True Positives / (True Positives + False Positives)
(d) True Negatives / Test Negatives

Answer: (c). PPV is the probability that a person with a positive test truly has the disease.

Problems 6.4–6.6: Performance Metrics

A new diabetes test yields:

	Disease Present	Disease Absent	Total
Test Positive	80	70	150
Test Negative	10	240	250
Total	90	310	400

6.4 Sensitivity\[80/90 \approx 89\%\]
6.5 Specificity\[240/310 \approx 77\%\]
6.6 PPV\[80/150 \approx 53\%\]

References

See the SOCR SDA ROC/AUC Learning Module.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36.
Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298.
SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset
pROC and ROCR R package documentation

SOCR Home page: https://socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

@@ Line 126: / Line 126: @@
 ==== Example 2: Comparing Classifiers for Alzheimer’s Disease ====
-This example compares Random Forest and Logistic Regression models using [[https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data | Global Gray Matter Volume (GMV) and demographic features]].
+This example compares Random Forest and Logistic Regression models using [https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data Global Gray Matter Volume (GMV) and demographic features].
 <pre>

Difference between revisions of "SMHS ROC"

Revision as of 09:34, 23 February 2026

Contents

Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve

Overview

Motivation

Theory

The Confusion Matrix and Core Metrics

Constructing the ROC Curve

Applications and Interpretation

Area Under the Curve (AUC)

Optimal Threshold Selection and Cost Analysis

Practical Implementation in R

Example 1: Basic ROC Analysis for T4 Test

Example 2: Comparing Classifiers for Alzheimer’s Disease

Problems

Problem 6.1: ROC Construction

Problem 6.2: Clinical Application

Problem 6.3: Definitions

Problems 6.4–6.6: Performance Metrics

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools