Difference between revisions of "SMHS ROC"
(→Applications) |
(→Example 2: Comparing Classifiers for Alzheimer’s Disease) |
||
| (41 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | ==[[SMHS| Scientific Methods for Health Sciences]] - Receiver Operating Characteristic (ROC) Curve == | + | == [[SMHS|Scientific Methods for Health Sciences]] - Receiver Operating Characteristic (ROC) Curve == |
| + | === Overview === | ||
| + | The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings. | ||
| − | + | By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance. | |
| − | |||
| − | ===Motivation=== | + | === Motivation === |
| − | + | In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While [[SMHS_PowerSensitivitySpecificity|sensitivity and specificity]] describe accuracy at a single threshold, classifier performance changes as this threshold shifts. | |
| − | * | + | |
| − | *The closer the curve | + | Key objectives of ROC analysis include: |
| − | + | * Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (<math>1 - \text{FPR}</math>). | |
| − | *The slope of the tangent | + | * Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability). |
| − | *The | + | * Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits. |
| − | + | * Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one. | |
| + | |||
| + | === Theory === | ||
| + | ==== The Confusion Matrix and Core Metrics ==== | ||
| + | A binary classifier produces four possible outcomes based on a decision threshold: | ||
<center> | <center> | ||
| − | {|class="wikitable" style="text-align:center; width: | + | {|class="wikitable" style="text-align:center; width:90%" border="1" |
| − | |||
| − | |||
|- | |- | ||
| − | | | + | |rowspan=2| ||rowspan=2| ||colspan=2|True Condition (Gold Standard) |
|- | |- | ||
| − | | | + | | Disease (Positive) || No Disease (Negative) |
|- | |- | ||
| − | |Positive||( | + | |rowspan=2|Test Result||Positive|| True Positive (TP) <br> (Hit) || False Positive (FP) <br> (Type I Error, <math>\alpha</math>) |
|- | |- | ||
| − | | | + | |Negative|| False Negative (FN) <br> (Type II Error, <math>\beta</math>) || True Negative (TN) <br> (Correct Rejection) |
|} | |} | ||
</center> | </center> | ||
| − | === | + | Fundamental metrics derived from this matrix: |
| + | * Sensitivity (True Positive Rate): | ||
| + | <math>\text{Sensitivity} = \frac{TP}{TP + FN}</math> | ||
| + | * Specificity (True Negative Rate): | ||
| + | <math>\text{Specificity} = \frac{TN}{TN + FP}</math> | ||
| + | * False Positive Rate: | ||
| + | <math>FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}</math> | ||
| − | + | ==== Constructing the ROC Curve ==== | |
| − | + | To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | Example: Hypothyroidism Diagnosis Using T4 Levels | |
| − | + | The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals. | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
<center> | <center> | ||
| − | {|class="wikitable" style="text-align:center; width: | + | {|class="wikitable" style="text-align:center; width:80%" |
| + | |+ Frequency of T4 Levels in Hypothyroid vs. Euthyroid Patients | ||
|- | |- | ||
| − | + | ! Group !! <1 !! 1–2 !! 2–3 !! 3–4 !! 4–5 !! 5–6 !! 6–7 !! 7–8 !! 8–9 !! 9–10 !! 10–11 !! >11 | |
|- | |- | ||
| − | | | + | | Hypothyroid || 2 || 3 || 1 || 8 || 4 || 4 || 3 || 3 || 1 || 0 || 2 || 1 |
|- | |- | ||
| − | | | + | | Euthyroid || 0 || 0 || 0 || 0 || 1 || 6 || 11 || 19 || 17 || 20 || 11 || 8 |
| − | | | ||
| − | | | ||
| − | | | ||
| − | | | ||
| − | | | ||
| − | | | ||
|} | |} | ||
</center> | </center> | ||
| + | We compute performance metrics at different thresholds: | ||
| − | + | * Cut-point = 5 (strict): | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | <math>T4 \leq 5 \to \text{ test positive }, </math> | |
| − | + | ||
| − | + | <math>\text{Sensitivity} = 18/32 = 0.56,</math> | |
| − | |||
| − | install.packages("pROC") | + | <math>\text{Specificity} = 92/93 = 0.99.</math> |
| − | + | ||
| − | + | * Cut-point = 7 (moderate): | |
| − | + | ||
| − | + | <math>\text{Sensitivity} = 25/32 = 0.78,</math> | |
| − | + | ||
| − | + | <math>\text{Specificity} = 75/93 = 0.81.</math> | |
| − | + | ||
| − | + | * Cut-point = 9 (lenient): | |
| + | |||
| + | <math>\text{Sensitivity} = 29/32 = 0.91,</math> | ||
| + | |||
| + | <math>\text{Specificity} = 39/93 = 0.42.</math> | ||
| + | |||
| + | Plotting Sensitivity (y-axis) versus FPR = <math>1 - \text{Specificity}</math> (x-axis) for all thresholds yields the ROC curve. | ||
| + | |||
| + | === Applications and Interpretation === | ||
| + | ==== Area Under the Curve (AUC) ==== | ||
| + | The AUC provides a standardized measure of overall diagnostic accuracy: | ||
| + | |||
| + | * 0.90–1.00: Excellent | ||
| + | * 0.80–0.90: Good | ||
| + | * 0.70–0.80: Fair | ||
| + | * 0.60–0.70: Poor | ||
| + | * 0.50–0.60: Fail (no better than chance) | ||
| + | |||
| + | In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability. | ||
| + | |||
| + | ==== Optimal Threshold Selection and Cost Analysis ==== | ||
| + | While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN). | ||
| + | |||
| + | The slope method identifies the optimal point on the ROC curve where the tangent slope equals: | ||
| + | <math>\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}</math>. | ||
| + | |||
| + | * If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is <math>1/8</math>. | ||
| + | * If treatment risks make FP 2× more costly, the target slope is 2. | ||
| + | |||
| + | This approach balances clinical priorities with statistical performance. | ||
| + | |||
| + | === Practical Implementation in R === | ||
| + | Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization. | ||
| + | |||
| + | ==== Example 1: Basic ROC Analysis for T4 Test ==== | ||
| + | <pre> | ||
| + | # Install and load required package | ||
| + | if (!require("pROC")) install.packages("pROC") | ||
| + | library(pROC) | ||
| + | |||
| + | # Simulated data based on T4 distribution | ||
| + | response <- c(rep(1, 32), rep(0, 93)) # 1 = Hypothyroid, 0 = Euthyroid | ||
| + | |||
| + | # Simulated T4 values (lower = more likely diseased) | ||
| + | predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3)) | ||
| + | predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39)) | ||
| + | predictor <- c(predictor_pos, predictor_neg) | ||
| + | |||
| + | # Build ROC object (higher predictor = less likely diseased) | ||
| + | roc_obj <- roc(response, predictor, direction = ">") | ||
| + | |||
| + | # Plot ROC curve with AUC | ||
| + | plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE) | ||
| + | |||
| + | # Identify optimal threshold (Youden index) | ||
| + | coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity")) | ||
| + | </pre> | ||
| + | |||
| + | ==== Example 2: Comparing Classifiers for Alzheimer’s Disease ==== | ||
| + | This example compares [https://socr.umich.edu/DSPA2/DSPA2_notes/06_ML_NN_SVM_RF_Class.html Random Forest (machine learning decision/prediction)] and [https://socr.umich.edu/DSPA2/DSPA2_notes/11_FeatureSelection.html#216_Logistic_Transformation Logistic Regression models] using [https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data Global Gray Matter Volume (GMV) and demographic features]. | ||
| + | |||
| + | <pre> | ||
| + | # Load required libraries | ||
| + | if (!require("randomForest")) install.packages("randomForest") | ||
| + | if (!require("ROCR")) install.packages("ROCR") | ||
| + | if (!require("pROC")) install.packages("pROC") | ||
| + | if (!require("caret")) install.packages("caret") | ||
| + | library(randomForest) | ||
| + | library(ROCR) | ||
| + | library(pROC) | ||
| + | library(caret) | ||
| + | library("XML"); library("xml2"); library("rvest") | ||
| + | |||
| + | # Load data | ||
| + | wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data") | ||
| + | dataset <- html_table(html_nodes(wiki_url, "table")[[2]]) | ||
| + | |||
| + | # Clean and preprocess | ||
| + | dataset_clean <- subset(dataset, Group %in% c("AD", "NC")) | ||
| + | dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD")) | ||
| + | # Convert to binary: 1 for AD (positive class), 0 for NC | ||
| + | dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0) | ||
| − | < | + | # Ensure predictors are correct type |
| − | + | dataset_clean$GMV <- as.numeric(dataset_clean$GMV) | |
| − | < | + | dataset_clean$Age <- as.numeric(dataset_clean$Age) |
| + | dataset_clean$Sex <- as.factor(dataset_clean$Sex) | ||
| − | + | # Remove NAs | |
| − | + | dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")]) | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | # Check class balance | |
| − | + | cat("Class distribution:\n") | |
| − | : | + | print(table(dataset_clean$Group_binary)) |
| + | cat("\nTotal observations:", nrow(dataset_clean), "\n") | ||
| − | + | # Set seed for reproducibility | |
| + | set.seed(123) | ||
| − | + | # Split data into training and testing (70/30) | |
| + | train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean)) | ||
| + | train_data <- dataset_clean[train_idx, ] | ||
| + | test_data <- dataset_clean[-train_idx, ] | ||
| − | + | # Random Forest model (with all features including Sex) | |
| + | rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex, | ||
| + | data = train_data, | ||
| + | ntree = 500, | ||
| + | importance = TRUE) | ||
| − | + | # Logistic regression model (with all features) | |
| − | + | glm_model <- glm(Group_binary ~ GMV + Age + Sex, | |
| − | + | data = train_data, | |
| − | + | family = binomial(link = "logit")) | |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | + | # Get predictions on TEST data (not training data) | |
| + | rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"] # Probability of class 1 (AD) | ||
| + | glm_probs <- predict(glm_model, test_data, type = "response") | ||
| − | + | # Check prediction distributions | |
| + | cat("\nPrediction summary:\n") | ||
| + | cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n") | ||
| + | cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n") | ||
| − | + | # ROC curves using ROCR package (corrected) | |
| + | pred_rf <- prediction(rf_probs, test_data$Group_binary) | ||
| + | perf_rf <- performance(pred_rf, "tpr", "fpr") | ||
| + | auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]]) | ||
| − | + | pred_glm <- prediction(glm_probs, test_data$Group_binary) | |
| − | + | perf_glm <- performance(pred_glm, "tpr", "fpr") | |
| − | + | auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]]) | |
| − | |||
| − | + | # Alternative: Use pROC for better diagnostics and plotting | |
| + | roc_rf <- roc(test_data$Group_binary, rf_probs) | ||
| + | roc_glm <- roc(test_data$Group_binary, glm_probs) | ||
| − | + | # Comparative plot with BOTH ROCR and pROC approaches | |
| + | par(mfrow = c(1, 2)) | ||
| − | + | # Plot 1: Using ROCR | |
| − | + | plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)") | |
| − | + | plot(perf_glm, col = "red", lwd = 2, add = TRUE) | |
| − | + | abline(a = 0, b = 1, lty = 2, col = "gray") | |
| − | + | legend("bottomright", | |
| − | + | legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"), | |
| − | + | paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")), | |
| − | + | col = c("blue", "red"), lwd = 2, cex = 0.8) | |
| − | |||
| − | |||
| − | |||
| − | + | # Plot 2: Using pROC (often more robust) | |
| + | plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)") | ||
| + | lines(roc_glm, col = "red", lwd = 2) | ||
| + | legend("bottomright", | ||
| + | legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"), | ||
| + | paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")), | ||
| + | col = c("blue", "red"), lwd = 2, cex = 0.8) | ||
| − | + | # Reset plot layout | |
| + | par(mfrow = c(1, 1)) | ||
| − | + | # Print model performance metrics | |
| + | cat("\n=== Model Performance ===\n") | ||
| + | cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n") | ||
| + | cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n") | ||
| + | cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n") | ||
| + | cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n") | ||
| − | ( | + | # If AUC is still below 0.5, flip predictions |
| + | if (auc_rf < 0.5) { | ||
| + | cat("\nNote: RF AUC < 0.5. Flipping predictions...\n") | ||
| + | rf_probs <- 1 - rf_probs | ||
| + | pred_rf <- prediction(rf_probs, test_data$Group_binary) | ||
| + | auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]]) | ||
| + | cat("Corrected RF AUC:", round(auc_rf, 3), "\n") | ||
| + | } | ||
| − | ( | + | if (auc_glm < 0.5) { |
| + | cat("Note: GLM AUC < 0.5. Flipping predictions...\n") | ||
| + | glm_probs <- 1 - glm_probs | ||
| + | pred_glm <- prediction(glm_probs, test_data$Group_binary) | ||
| + | auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]]) | ||
| + | cat("Corrected GLM AUC:", round(auc_glm, 3), "\n") | ||
| + | } | ||
| − | ( | + | # Final improved plot with corrected AUCs |
| + | plot(perf_rf, col = "blue", lwd = 2, | ||
| + | main = "ROC Comparison: Random Forest vs Logistic Regression", | ||
| + | xlab = "False Positive Rate", | ||
| + | ylab = "True Positive Rate") | ||
| + | plot(perf_glm, col = "red", lwd = 2, add = TRUE) | ||
| + | abline(a = 0, b = 1, lty = 2, col = "gray") | ||
| − | ( | + | # Fixed Legend |
| + | legend("bottomright", | ||
| + | legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"), | ||
| + | paste("GLM (AUC =", round(auc_glm, 3), ")")), | ||
| + | col = c("blue", "red"), | ||
| + | lwd = 2, | ||
| + | cex = 0.75, # Reduced size (0.9 was likely too big) | ||
| + | bty = "n", # "n" removes the box entirely for a cleaner look | ||
| + | inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes | ||
| + | y.intersp = 0.8) # Tightens the vertical spacing between lines | ||
| + | </pre> | ||
| − | 6. | + | === Problems === |
| + | ==== Problem 6.1: ROC Construction ==== | ||
| + | A study evaluates a biomarker for distinguishing lung cancer subtypes: | ||
<center> | <center> | ||
| − | {|class="wikitable" style="text-align:center; width: | + | {|class="wikitable" style="text-align:center; width:50%" |
|- | |- | ||
| − | + | ! Biomarker Range !! Type A (Positive) !! Type B (Negative) | |
|- | |- | ||
| − | | | + | | < 2 || 3 || 4 |
|- | |- | ||
| − | | | + | | 2–4 || 6 || 2 |
|- | |- | ||
| − | | | + | | 4–6 || 15 || 7 |
| + | |- | ||
| + | | 6–8 || 7 || 33 | ||
| + | |- | ||
| + | | > 8 || 1 || 38 | ||
|} | |} | ||
</center> | </center> | ||
| − | |||
| − | ( | + | * Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive). |
| + | * Compute the AUC and interpret whether the test is clinically useful. | ||
| − | ( | + | ==== Problem 6.2: Clinical Application ==== |
| + | True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity. | ||
| − | ( | + | Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing. |
| − | (d) | + | ==== Problem 6.3: Definitions ==== |
| + | The Positive Predictive Value (PPV) is calculated as: | ||
| + | * (a) True Positives / Total Population | ||
| + | * (b) True Negatives / (True Negatives + False Positives) | ||
| + | * (c) True Positives / (True Positives + False Positives) | ||
| + | * (d) True Negatives / Test Negatives | ||
| + | Answer: (c). PPV is the probability that a person with a positive test truly has the disease. | ||
| − | 6. | + | ==== Problems 6.4–6.6: Performance Metrics ==== |
| + | A new diabetes test yields: | ||
| − | + | <center> | |
| − | + | {|class="wikitable" style="text-align:center;" | |
| − | + | |- | |
| − | + | ! !! Disease Present !! Disease Absent !! Total | |
| − | + | |- | |
| − | + | | Test Positive || 80 || 70 || 150 | |
| − | + | |- | |
| − | + | | Test Negative || 10 || 240 || 250 | |
| − | + | |- | |
| − | + | | Total || 90 || 310 || 400 | |
| − | + | |} | |
| − | + | </center> | |
| − | |||
| − | + | * 6.4 Sensitivity: <math>80/90 \approx 89\%</math> | |
| + | * 6.5 Specificity: <math>240/310 \approx 77\%</math> | ||
| + | * 6.6 PPV: <math>80/150 \approx 53\%</math> | ||
| − | + | === References === | |
| + | * [https://sda.statisticalcomputing.org/learning See the SOCR SDA ROC/AUC Learning Module]. | ||
| + | * Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36. | ||
| + | * Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298. | ||
| + | * SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset | ||
| + | * pROC and ROCR R package documentation | ||
| − | + | <hr> | |
| + | * SOCR Home page: https://socr.umich.edu | ||
| − | + | {{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SMHS_ROC}} | |
| − | |||
| − | |||
| − | |||
Latest revision as of 09:42, 23 February 2026
Contents
Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve
Overview
The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.
By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.
Motivation
In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While sensitivity and specificity describe accuracy at a single threshold, classifier performance changes as this threshold shifts.
Key objectives of ROC analysis include:
- Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (\(1 - \text{FPR}\)).
- Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
- Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
- Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.
Theory
The Confusion Matrix and Core Metrics
A binary classifier produces four possible outcomes based on a decision threshold:
| True Condition (Gold Standard) | |||
| Disease (Positive) | No Disease (Negative) | ||
| Test Result | Positive | True Positive (TP) (Hit) |
False Positive (FP) (Type I Error, \(\alpha\)) |
| Negative | False Negative (FN) (Type II Error, \(\beta\)) |
True Negative (TN) (Correct Rejection) | |
Fundamental metrics derived from this matrix:
- Sensitivity (True Positive Rate)\[\text{Sensitivity} = \frac{TP}{TP + FN}\]
- Specificity (True Negative Rate)\[\text{Specificity} = \frac{TN}{TN + FP}\]
- False Positive Rate\[FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}\]
Constructing the ROC Curve
To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.
Example: Hypothyroidism Diagnosis Using T4 Levels The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.
| Group | <1 | 1–2 | 2–3 | 3–4 | 4–5 | 5–6 | 6–7 | 7–8 | 8–9 | 9–10 | 10–11 | >11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hypothyroid | 2 | 3 | 1 | 8 | 4 | 4 | 3 | 3 | 1 | 0 | 2 | 1 |
| Euthyroid | 0 | 0 | 0 | 0 | 1 | 6 | 11 | 19 | 17 | 20 | 11 | 8 |
We compute performance metrics at different thresholds:
- Cut-point = 5 (strict)\[T4 \leq 5 \to \text{ test positive }, \]
\(\text{Sensitivity} = 18/32 = 0.56,\)
\(\text{Specificity} = 92/93 = 0.99.\)
- Cut-point = 7 (moderate)\[\text{Sensitivity} = 25/32 = 0.78,\]
\(\text{Specificity} = 75/93 = 0.81.\)
- Cut-point = 9 (lenient)\[\text{Sensitivity} = 29/32 = 0.91,\]
\(\text{Specificity} = 39/93 = 0.42.\)
Plotting Sensitivity (y-axis) versus FPR = \(1 - \text{Specificity}\) (x-axis) for all thresholds yields the ROC curve.
Applications and Interpretation
Area Under the Curve (AUC)
The AUC provides a standardized measure of overall diagnostic accuracy:
- 0.90–1.00: Excellent
- 0.80–0.90: Good
- 0.70–0.80: Fair
- 0.60–0.70: Poor
- 0.50–0.60: Fail (no better than chance)
In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.
Optimal Threshold Selection and Cost Analysis
While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).
The slope method identifies the optimal point on the ROC curve where the tangent slope equals\[\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}\].
- If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is \(1/8\).
- If treatment risks make FP 2× more costly, the target slope is 2.
This approach balances clinical priorities with statistical performance.
Practical Implementation in R
Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.
Example 1: Basic ROC Analysis for T4 Test
# Install and load required package
if (!require("pROC")) install.packages("pROC")
library(pROC)
# Simulated data based on T4 distribution
response <- c(rep(1, 32), rep(0, 93)) # 1 = Hypothyroid, 0 = Euthyroid
# Simulated T4 values (lower = more likely diseased)
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
predictor <- c(predictor_pos, predictor_neg)
# Build ROC object (higher predictor = less likely diseased)
roc_obj <- roc(response, predictor, direction = ">")
# Plot ROC curve with AUC
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)
# Identify optimal threshold (Youden index)
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))
Example 2: Comparing Classifiers for Alzheimer’s Disease
This example compares Random Forest (machine learning decision/prediction) and Logistic Regression models using Global Gray Matter Volume (GMV) and demographic features.
# Load required libraries
if (!require("randomForest")) install.packages("randomForest")
if (!require("ROCR")) install.packages("ROCR")
if (!require("pROC")) install.packages("pROC")
if (!require("caret")) install.packages("caret")
library(randomForest)
library(ROCR)
library(pROC)
library(caret)
library("XML"); library("xml2"); library("rvest")
# Load data
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])
# Clean and preprocess
dataset_clean <- subset(dataset, Group %in% c("AD", "NC"))
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD"))
# Convert to binary: 1 for AD (positive class), 0 for NC
dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0)
# Ensure predictors are correct type
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
dataset_clean$Age <- as.numeric(dataset_clean$Age)
dataset_clean$Sex <- as.factor(dataset_clean$Sex)
# Remove NAs
dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")])
# Check class balance
cat("Class distribution:\n")
print(table(dataset_clean$Group_binary))
cat("\nTotal observations:", nrow(dataset_clean), "\n")
# Set seed for reproducibility
set.seed(123)
# Split data into training and testing (70/30)
train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean))
train_data <- dataset_clean[train_idx, ]
test_data <- dataset_clean[-train_idx, ]
# Random Forest model (with all features including Sex)
rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex,
data = train_data,
ntree = 500,
importance = TRUE)
# Logistic regression model (with all features)
glm_model <- glm(Group_binary ~ GMV + Age + Sex,
data = train_data,
family = binomial(link = "logit"))
# Get predictions on TEST data (not training data)
rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"] # Probability of class 1 (AD)
glm_probs <- predict(glm_model, test_data, type = "response")
# Check prediction distributions
cat("\nPrediction summary:\n")
cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n")
cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n")
# ROC curves using ROCR package (corrected)
pred_rf <- prediction(rf_probs, test_data$Group_binary)
perf_rf <- performance(pred_rf, "tpr", "fpr")
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
pred_glm <- prediction(glm_probs, test_data$Group_binary)
perf_glm <- performance(pred_glm, "tpr", "fpr")
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
# Alternative: Use pROC for better diagnostics and plotting
roc_rf <- roc(test_data$Group_binary, rf_probs)
roc_glm <- roc(test_data$Group_binary, glm_probs)
# Comparative plot with BOTH ROCR and pROC approaches
par(mfrow = c(1, 2))
# Plot 1: Using ROCR
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
legend("bottomright",
legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
col = c("blue", "red"), lwd = 2, cex = 0.8)
# Plot 2: Using pROC (often more robust)
plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)")
lines(roc_glm, col = "red", lwd = 2)
legend("bottomright",
legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"),
paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")),
col = c("blue", "red"), lwd = 2, cex = 0.8)
# Reset plot layout
par(mfrow = c(1, 1))
# Print model performance metrics
cat("\n=== Model Performance ===\n")
cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n")
cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n")
cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n")
cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n")
# If AUC is still below 0.5, flip predictions
if (auc_rf < 0.5) {
cat("\nNote: RF AUC < 0.5. Flipping predictions...\n")
rf_probs <- 1 - rf_probs
pred_rf <- prediction(rf_probs, test_data$Group_binary)
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
cat("Corrected RF AUC:", round(auc_rf, 3), "\n")
}
if (auc_glm < 0.5) {
cat("Note: GLM AUC < 0.5. Flipping predictions...\n")
glm_probs <- 1 - glm_probs
pred_glm <- prediction(glm_probs, test_data$Group_binary)
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
cat("Corrected GLM AUC:", round(auc_glm, 3), "\n")
}
# Final improved plot with corrected AUCs
plot(perf_rf, col = "blue", lwd = 2,
main = "ROC Comparison: Random Forest vs Logistic Regression",
xlab = "False Positive Rate",
ylab = "True Positive Rate")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
# Fixed Legend
legend("bottomright",
legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"),
paste("GLM (AUC =", round(auc_glm, 3), ")")),
col = c("blue", "red"),
lwd = 2,
cex = 0.75, # Reduced size (0.9 was likely too big)
bty = "n", # "n" removes the box entirely for a cleaner look
inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes
y.intersp = 0.8) # Tightens the vertical spacing between lines
Problems
Problem 6.1: ROC Construction
A study evaluates a biomarker for distinguishing lung cancer subtypes:
| Biomarker Range | Type A (Positive) | Type B (Negative) |
|---|---|---|
| < 2 | 3 | 4 |
| 2–4 | 6 | 2 |
| 4–6 | 15 | 7 |
| 6–8 | 7 | 33 |
| > 8 | 1 | 38 |
- Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
- Compute the AUC and interpret whether the test is clinically useful.
Problem 6.2: Clinical Application
True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.
Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.
Problem 6.3: Definitions
The Positive Predictive Value (PPV) is calculated as:
- (a) True Positives / Total Population
- (b) True Negatives / (True Negatives + False Positives)
- (c) True Positives / (True Positives + False Positives)
- (d) True Negatives / Test Negatives
Answer: (c). PPV is the probability that a person with a positive test truly has the disease.
Problems 6.4–6.6: Performance Metrics
A new diabetes test yields:
| Disease Present | Disease Absent | Total | |
|---|---|---|---|
| Test Positive | 80 | 70 | 150 |
| Test Negative | 10 | 240 | 250 |
| Total | 90 | 310 | 400 |
- 6.4 Sensitivity\[80/90 \approx 89\%\]
- 6.5 Specificity\[240/310 \approx 77\%\]
- 6.6 PPV\[80/150 \approx 53\%\]
References
- See the SOCR SDA ROC/AUC Learning Module.
- Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36.
- Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298.
- SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset
- pROC and ROCR R package documentation
- SOCR Home page: https://socr.umich.edu
Translate this page: