Difference between revisions of "SMHS ROC"

From SOCR
Jump to: navigation, search
(Example 2: Comparing Classifiers for Alzheimer’s Disease)
m (Example 2: Comparing Classifiers for Alzheimer’s Disease)
Line 126: Line 126:
  
 
==== Example 2: Comparing Classifiers for Alzheimer’s Disease ====
 
==== Example 2: Comparing Classifiers for Alzheimer’s Disease ====
This example compares Random Forest and Logistic Regression models using [[https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data | Global Gray Matter Volume (GMV) and demographic features]].
+
This example compares Random Forest and Logistic Regression models using [https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data Global Gray Matter Volume (GMV) and demographic features].
  
 
<pre>
 
<pre>

Revision as of 09:34, 23 February 2026

Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve

Overview

The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.

By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.

Motivation

In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While sensitivity and specificity describe accuracy at a single threshold, classifier performance changes as this threshold shifts.

Key objectives of ROC analysis include:

  • Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (\(1 - \text{FPR}\)).
  • Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
  • Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
  • Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.

Theory

The Confusion Matrix and Core Metrics

A binary classifier produces four possible outcomes based on a decision threshold:

True Condition (Gold Standard)
Disease (Positive) No Disease (Negative)
Test Result Positive True Positive (TP)
(Hit)
False Positive (FP)
(Type I Error, \(\alpha\))
Negative False Negative (FN)
(Type II Error, \(\beta\))
True Negative (TN)
(Correct Rejection)

Fundamental metrics derived from this matrix:

  • Sensitivity (True Positive Rate)\[\text{Sensitivity} = \frac{TP}{TP + FN}\]
  • Specificity (True Negative Rate)\[\text{Specificity} = \frac{TN}{TN + FP}\]
  • False Positive Rate\[FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}\]

Constructing the ROC Curve

To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.

Example: Hypothyroidism Diagnosis Using T4 Levels The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.

Frequency of T4 Levels in Hypothyroid vs. Euthyroid Patients
Group <1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 >11
Hypothyroid 2 3 1 8 4 4 3 3 1 0 2 1
Euthyroid 0 0 0 0 1 6 11 19 17 20 11 8

We compute performance metrics at different thresholds:

  • Cut-point = 5 (strict):
 T4 ≤ 5 → test positive  
 \(\text{Sensitivity} = 18/32 = 0.56\),  
 \(\text{Specificity} = 92/93 = 0.99\)
  • Cut-point = 7 (moderate)\[\text{Sensitivity} = 25/32 = 0.78\],
 \(\text{Specificity} = 75/93 = 0.81\)
  • Cut-point = 9 (lenient)\[\text{Sensitivity} = 29/32 = 0.91\],
 \(\text{Specificity} = 39/93 = 0.42\)

Plotting Sensitivity (y-axis) versus FPR = \(1 - \text{Specificity}\) (x-axis) for all thresholds yields the ROC curve.

Applications and Interpretation

Area Under the Curve (AUC)

The AUC provides a standardized measure of overall diagnostic accuracy:

  • 0.90–1.00: Excellent
  • 0.80–0.90: Good
  • 0.70–0.80: Fair
  • 0.60–0.70: Poor
  • 0.50–0.60: Fail (no better than chance)

In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.

Optimal Threshold Selection and Cost Analysis

While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).

The slope method identifies the optimal point on the ROC curve where the tangent slope equals\[\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}\].

  • If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is \(1/8\).
  • If treatment risks make FP 2× more costly, the target slope is 2.

This approach balances clinical priorities with statistical performance.

Practical Implementation in R

Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.

Example 1: Basic ROC Analysis for T4 Test

# Install and load required package
if (!require("pROC")) install.packages("pROC")
library(pROC)

# Simulated data based on T4 distribution
response <- c(rep(1, 32), rep(0, 93))  # 1 = Hypothyroid, 0 = Euthyroid

# Simulated T4 values (lower = more likely diseased)
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
predictor <- c(predictor_pos, predictor_neg)

# Build ROC object (higher predictor = less likely diseased)
roc_obj <- roc(response, predictor, direction = ">")

# Plot ROC curve with AUC
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)

# Identify optimal threshold (Youden index)
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))

Example 2: Comparing Classifiers for Alzheimer’s Disease

This example compares Random Forest and Logistic Regression models using Global Gray Matter Volume (GMV) and demographic features.

# Load required libraries
if (!require("randomForest")) install.packages("randomForest")
if (!require("ROCR")) install.packages("ROCR")
if (!require("pROC")) install.packages("pROC")
if (!require("caret")) install.packages("caret")
library(randomForest)
library(ROCR)
library(pROC)
library(caret)
library("XML"); library("xml2"); library("rvest")

# Load data
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])

# Clean and preprocess
dataset_clean <- subset(dataset, Group %in% c("AD", "NC"))
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD"))
# Convert to binary: 1 for AD (positive class), 0 for NC
dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0)

# Ensure predictors are correct type
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
dataset_clean$Age <- as.numeric(dataset_clean$Age)
dataset_clean$Sex <- as.factor(dataset_clean$Sex)

# Remove NAs
dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")])

# Check class balance
cat("Class distribution:\n")
print(table(dataset_clean$Group_binary))
cat("\nTotal observations:", nrow(dataset_clean), "\n")

# Set seed for reproducibility
set.seed(123)

# Split data into training and testing (70/30)
train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean))
train_data <- dataset_clean[train_idx, ]
test_data <- dataset_clean[-train_idx, ]

# Random Forest model (with all features including Sex)
rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex, 
                         data = train_data, 
                         ntree = 500,
                         importance = TRUE)

# Logistic regression model (with all features)
glm_model <- glm(Group_binary ~ GMV + Age + Sex, 
                 data = train_data, 
                 family = binomial(link = "logit"))

# Get predictions on TEST data (not training data)
rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"]  # Probability of class 1 (AD)
glm_probs <- predict(glm_model, test_data, type = "response")

# Check prediction distributions
cat("\nPrediction summary:\n")
cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n")
cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n")

# ROC curves using ROCR package (corrected)
pred_rf <- prediction(rf_probs, test_data$Group_binary)
perf_rf <- performance(pred_rf, "tpr", "fpr")
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])

pred_glm <- prediction(glm_probs, test_data$Group_binary)
perf_glm <- performance(pred_glm, "tpr", "fpr")
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])

# Alternative: Use pROC for better diagnostics and plotting
roc_rf <- roc(test_data$Group_binary, rf_probs)
roc_glm <- roc(test_data$Group_binary, glm_probs)

# Comparative plot with BOTH ROCR and pROC approaches
par(mfrow = c(1, 2))

# Plot 1: Using ROCR
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Plot 2: Using pROC (often more robust)
plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)")
lines(roc_glm, col = "red", lwd = 2)
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Reset plot layout
par(mfrow = c(1, 1))

# Print model performance metrics
cat("\n=== Model Performance ===\n")
cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n")
cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n")
cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n")
cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n")

# If AUC is still below 0.5, flip predictions
if (auc_rf < 0.5) {
  cat("\nNote: RF AUC < 0.5. Flipping predictions...\n")
  rf_probs <- 1 - rf_probs
  pred_rf <- prediction(rf_probs, test_data$Group_binary)
  auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
  cat("Corrected RF AUC:", round(auc_rf, 3), "\n")
}

if (auc_glm < 0.5) {
  cat("Note: GLM AUC < 0.5. Flipping predictions...\n")
  glm_probs <- 1 - glm_probs
  pred_glm <- prediction(glm_probs, test_data$Group_binary)
  auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
  cat("Corrected GLM AUC:", round(auc_glm, 3), "\n")
}

# Final improved plot with corrected AUCs
plot(perf_rf, col = "blue", lwd = 2, 
     main = "ROC Comparison: Random Forest vs Logistic Regression",
     xlab = "False Positive Rate", 
     ylab = "True Positive Rate")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")

# Fixed Legend
legend("bottomright", 
       legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"),
                  paste("GLM (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), 
       lwd = 2, 
       cex = 0.75,          # Reduced size (0.9 was likely too big)
       bty = "n",           # "n" removes the box entirely for a cleaner look
       inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes
       y.intersp = 0.8)     # Tightens the vertical spacing between lines

Problems

Problem 6.1: ROC Construction

A study evaluates a biomarker for distinguishing lung cancer subtypes:

Biomarker Range Type A (Positive) Type B (Negative)
< 2 3 4
2–4 6 2
4–6 15 7
6–8 7 33
> 8 1 38
  • Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
  • Compute the AUC and interpret whether the test is clinically useful.

Problem 6.2: Clinical Application

True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.

Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.

Problem 6.3: Definitions

The Positive Predictive Value (PPV) is calculated as:

  • (a) True Positives / Total Population
  • (b) True Negatives / (True Negatives + False Positives)
  • (c) True Positives / (True Positives + False Positives)
  • (d) True Negatives / Test Negatives

Answer: (c). PPV is the probability that a person with a positive test truly has the disease.

Problems 6.4–6.6: Performance Metrics

A new diabetes test yields:

Disease Present Disease Absent Total
Test Positive 80 70 150
Test Negative 10 240 250
Total 90 310 400
  • 6.4 Sensitivity\[80/90 \approx 89\%\]
  • 6.5 Specificity\[240/310 \approx 77\%\]
  • 6.6 PPV\[80/150 \approx 53\%\]

References

  • See the SOCR SDA ROC/AUC Learning Module.
  • Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36.
  • Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298.
  • SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset
  • pROC and ROCR R package documentation



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif