SMHS ROC
Contents
Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve
Overview
The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.
By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.
Motivation
In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While sensitivity and specificity describe accuracy at a single threshold, classifier performance changes as this threshold shifts.
Key objectives of ROC analysis include:
- Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (\(1 - \text{FPR}\)).
- Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
- Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
- Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.
Theory
The Confusion Matrix and Core Metrics
A binary classifier produces four possible outcomes based on a decision threshold:
| True Condition (Gold Standard) | |||
| Disease (Positive) | No Disease (Negative) | ||
| Test Result | Positive | True Positive (TP) (Hit) |
False Positive (FP) (Type I Error, \(\alpha\)) |
| Negative | False Negative (FN) (Type II Error, \(\beta\)) |
True Negative (TN) (Correct Rejection) | |
Fundamental metrics derived from this matrix:
- Sensitivity (True Positive Rate)\[\text{Sensitivity} = \frac{TP}{TP + FN}\]
- Specificity (True Negative Rate)\[\text{Specificity} = \frac{TN}{TN + FP}\]
- False Positive Rate\[FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}\]
Constructing the ROC Curve
To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.
Example: Hypothyroidism Diagnosis Using T4 Levels The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.
| Group | <1 | 1–2 | 2–3 | 3–4 | 4–5 | 5–6 | 6–7 | 7–8 | 8–9 | 9–10 | 10–11 | >11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hypothyroid | 2 | 3 | 1 | 8 | 4 | 4 | 3 | 3 | 1 | 0 | 2 | 1 |
| Euthyroid | 0 | 0 | 0 | 0 | 1 | 6 | 11 | 19 | 17 | 20 | 11 | 8 |
We compute performance metrics at different thresholds:
- Cut-point = 5 (strict):
T4 ≤ 5 → test positive
\(\text{Sensitivity} = 18/32 = 0.56\),
\(\text{Specificity} = 92/93 = 0.99\)
- Cut-point = 7 (moderate)\[\text{Sensitivity} = 25/32 = 0.78\],
\(\text{Specificity} = 75/93 = 0.81\)
- Cut-point = 9 (lenient)\[\text{Sensitivity} = 29/32 = 0.91\],
\(\text{Specificity} = 39/93 = 0.42\)
Plotting Sensitivity (y-axis) versus FPR = \(1 - \text{Specificity}\) (x-axis) for all thresholds yields the ROC curve.
Applications and Interpretation
Area Under the Curve (AUC)
The AUC provides a standardized measure of overall diagnostic accuracy:
- 0.90–1.00: Excellent
- 0.80–0.90: Good
- 0.70–0.80: Fair
- 0.60–0.70: Poor
- 0.50–0.60: Fail (no better than chance)
In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.
Optimal Threshold Selection and Cost Analysis
While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).
The slope method identifies the optimal point on the ROC curve where the tangent slope equals\[\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}\].
- If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is \(1/8\).
- If treatment risks make FP 2× more costly, the target slope is 2.
This approach balances clinical priorities with statistical performance.
Practical Implementation in R
Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.
Example 1: Basic ROC Analysis for T4 Test
# Install and load required package
if (!require("pROC")) install.packages("pROC")
library(pROC)
# Simulated data based on T4 distribution
response <- c(rep(1, 32), rep(0, 93)) # 1 = Hypothyroid, 0 = Euthyroid
# Simulated T4 values (lower = more likely diseased)
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
predictor <- c(predictor_pos, predictor_neg)
# Build ROC object (higher predictor = less likely diseased)
roc_obj <- roc(response, predictor, direction = ">")
# Plot ROC curve with AUC
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)
# Identify optimal threshold (Youden index)
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))
Example 2: Comparing Classifiers for Alzheimer’s Disease
This example compares Random Forest and Logistic Regression models using Global Gray Matter Volume (GMV) and demographic features.
# Load required libraries
if (!require("randomForest")) install.packages("randomForest")
if (!require("ROCR")) install.packages("ROCR")
library(randomForest)
library(ROCR)
# install.packages("xml2")
library("XML"); library("xml2")
library("rvest")
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
html_nodes(wiki_url, "#content")
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])
# Dynamically load GMV dataset from SOCR GitHub
# url <- "https://raw.githubusercontent.com/SOCR/SOCR_Data/master/CSV_SOCR_Data/CSV_July2009_ID_NI_Curvedness_Data.csv"
# dataset <- read.csv(url, stringsAsFactors = FALSE)
# Preprocess: Binary classification (AD vs Non-AD); exclude MCI
dataset_clean <- subset(dataset, Group != "MCI")
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NonAD", "AD")) # Ensure proper coding
# Ensure GMV, Age are numeric; Sex as factor
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
dataset_clean$Age <- as.numeric(dataset_clean$Age)
dataset_clean$Sex <- as.factor(dataset_clean$Sex)
# Random Forest model
set.seed(123)
rf_model <- randomForest(Group ~ GMV + Age + Sex, data = dataset_clean, ntree = 100)
rf_probs <- predict(rf_model, dataset_clean, type = "prob")[, "AD"] # Explicitly reference "AD" column
# Logistic regression model (note: Sex may be excluded due to collinearity or design)
glm_model <- glm(Group ~ GMV + Age, data = dataset_clean, family = binomial)
glm_probs <- predict(glm_model, dataset_clean, type = "response")
# ROC performance
pred_rf <- prediction(rf_probs, dataset_clean$Group)
perf_rf <- performance(pred_rf, "tpr", "fpr")
auc_rf <- performance(pred_rf, "auc")@y.values[[1]]
pred_glm <- prediction(glm_probs, dataset_clean$Group)
perf_glm <- performance(pred_glm, "tpr", "fpr")
auc_glm <- performance(pred_glm, "auc")@y.values[[1]]
# Comparative plot
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Comparison: RF vs GLM")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
legend("bottomright",
legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
col = c("blue", "red"), lwd = 2)
Problems
Problem 6.1: ROC Construction
A study evaluates a biomarker for distinguishing lung cancer subtypes:
| Biomarker Range | Type A (Positive) | Type B (Negative) |
|---|---|---|
| < 2 | 3 | 4 |
| 2–4 | 6 | 2 |
| 4–6 | 15 | 7 |
| 6–8 | 7 | 33 |
| > 8 | 1 | 38 |
- Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
- Compute the AUC and interpret whether the test is clinically useful.
Problem 6.2: Clinical Application
True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.
Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.
Problem 6.3: Definitions
The Positive Predictive Value (PPV) is calculated as:
- (a) True Positives / Total Population
- (b) True Negatives / (True Negatives + False Positives)
- (c) True Positives / (True Positives + False Positives)
- (d) True Negatives / Test Negatives
Answer: (c). PPV is the probability that a person with a positive test truly has the disease.
Problems 6.4–6.6: Performance Metrics
A new diabetes test yields:
| Disease Present | Disease Absent | Total | |
|---|---|---|---|
| Test Positive | 80 | 70 | 150 |
| Test Negative | 10 | 240 | 250 |
| Total | 90 | 310 | 400 |
- 6.4 Sensitivity\[80/90 \approx 89\%\]
- 6.5 Specificity\[240/310 \approx 77\%\]
- 6.6 PPV\[80/150 \approx 53\%\]
References
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36.
- Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298.
- SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset
- pROC and ROCR R package documentation
- SOCR Home page: http://www.socr.umich.edu
Translate this page: