Difference between revisions of "SMHS ROC"

From SOCR
Jump to: navigation, search
(R AUC/ROC Example)
(Example 2: Comparing Classifiers for Alzheimer’s Disease)
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
==[[SMHS| Scientific Methods for Health Sciences]] - Receiver Operating Characteristic (ROC) Curve ==
+
== [[SMHS|Scientific Methods for Health Sciences]] - Receiver Operating Characteristic (ROC) Curve ==
  
 +
=== Overview ===
 +
The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.
  
===Overview===
+
By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.
Receiver operating characteristic (ROC curve) is a graphical plot, which illustrates the performance of a binary classifier system as its discrimination threshold varies. The ROC curve is created by plotting the fraction of true positive out of the total actual positives vs. the fraction of false positives out of the total actual negatives at various threshold settings. In this section, we are going to introduce the ROC curve and illustrate applications of this method with examples.
 
  
===Motivation===
+
=== Motivation ===
We have talked about the cases with a binary classification where the outcomes are either absent or present and the test results are positive or negative. We have also previously discussed [[SMHS_PowerSensitivitySpecificity|sensitivity and specificity]] which are associated with the concepts of true positive and true negatives. With ROC curve, we are looking to demonstrate the following aspects:
+
In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While [[SMHS_PowerSensitivitySpecificity|sensitivity and specificity]] describe accuracy at a single threshold, classifier performance changes as this threshold shifts.
*To show the tradeoff between [[SMHS_PowerSensitivitySpecificity|sensitivity and specificity]];
+
 
*The closer the curve follows the left-hand border and top border of ROC space, the more accurate is the test;
+
Key objectives of ROC analysis include:
*The closer the curve comes to the 45-degree diagonal, the less accurate is the test;
+
* Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (<math>1 - \text{FPR}</math>).
*The slope of the tangent line at a cut-point gives the likelihood ratio for the value of the test;
+
* Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
*The area under the curve is a measure of test accuracy.
+
* Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
[[SMHS_PowerSensitivitySpecificity#Type_I_Error.2C_Type_II_Error_and_Power| Also, see the SMHS Type I  and Type II error section]].
+
* Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.
 +
 
 +
=== Theory ===
 +
==== The Confusion Matrix and Core Metrics ====
 +
A binary classifier produces four possible outcomes based on a decision threshold:
  
 
<center>
 
<center>
{|class="wikitable" style="text-align:center; width:75%" border="1"
+
{|class="wikitable" style="text-align:center; width:90%" border="1"
|-
 
|colspan=2 rowspan=2| ||colspan=2|Actual condition
 
 
|-
 
|-
|Absent ($H_0$ is true)||Present ($H_1$ is true)
+
|rowspan=2| ||rowspan=2| ||colspan=2|True Condition (Gold Standard)
 
|-
 
|-
|rowspan=2|Test Result||Negative (fail to reject $H_0$)||Condition absent + Negative result = True (accurate) Negative (TN, 0.98505)||Condition present + Negative result = False (invalid) Negative (FN, 0.00025)Type II error (β)
+
| Disease (Positive) || No Disease (Negative)
 
|-
 
|-
|Positive||(reject $H_0$) Condition absent + Positive result = False Positive (FP, 0.00995) Type I error (α)||Condition Present + Positive result = True Positive (TP, 0.00475)
+
|rowspan=2|Test Result||Positive|| True Positive (TP) <br> (Hit) || False Positive (FP) <br> (Type I Error, <math>\alpha</math>)
 
|-
 
|-
|Test Interpretation||Power = 1-FN=1-0.00025 = 0.99975||Specificity: TN/(TN+FP) =0.98505/(0.98505+0.00995) = 0.99||Sensitivity: TP/(TP+FN) =0.00475/(0.00475+ 0.00025)= 0.95
+
|Negative|| False Negative (FN) <br> (Type II Error, <math>\beta</math>) || True Negative (TN) <br> (Correct Rejection)
 
|}
 
|}
 
</center>
 
</center>
  
===Theory===
+
Fundamental metrics derived from this matrix:
 +
* Sensitivity (True Positive Rate): 
 +
  <math>\text{Sensitivity} = \frac{TP}{TP + FN}</math>
 +
* Specificity (True Negative Rate): 
 +
  <math>\text{Specificity} = \frac{TN}{TN + FP}</math>
 +
* False Positive Rate: 
 +
  <math>FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}</math>
  
Review of basic concepts in a binary classification:
+
==== Constructing the ROC Curve ====
<center>
+
To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|rowspan=2| ||rowspan=2| ||colspan=2|Disease Status||colspan=2|Metrics
 
|-
 
|Disease||No Disease||Prevalence=$\frac{\sum Condition\, positive}{\sum Total\, population}$ ||
 
|-
 
|rowspan=2|Screening Test||Positive||a (True positives)||b (False positives)||Positive predictive value (PPV)=$\frac{\sum Ture\, positive}{\sum Test\,positives}$||False discovery rate (FDR)=$\frac{\sum False\, positive}{\sum Test\, positive}$
 
|-
 
|Negative||c (False negatives)||d (True negatives)||False omission rate (FOR)=$\frac{\sum False\, negative} {\sum Test\, negative}$||Negative predictive value (NPV)=$\frac{\sum True\, negative}{\sum Test\, negative}$
 
|-
 
| ||Positive Likelihood Ratio=TPR/FBR||True positive rate (TPR)=$\frac{\sum True\, positive} {\sum Condition\, positive}$||False positive rate (FPR)=$\frac{\sum False\, positive}{\sum Condition\, positive}$||Accuracy(ACC)=$\frac{\sum True\, positive}+ {\sum True\, negative} {\sum Total\, population}$| ||
 
|-
 
| ||Negative Likelihood Ratio=FNR/TNR||False negative rate (FNR)=$\frac{\sum False\, negative} {\sum condition\, negative}$||True negative rate (TNR)=$\frac{\sum True\, negative}{\sum Condition\, negative}$||True negative rate (TNR)=$\frac\sum True\, negative}{\sum Condition\, negative}$| ||
 
|}
 
</center>
 
  
* '''Caution''': The ROC curve is a powerful technique as each individual point on the curve represents an improper scoring rule that is optimized by fitting an inappropriate model. ( The area under the ROC curve (AUC) is a linear translation of the Wilcoxon-Mann-Whitney- rank correlation statistics. Using the ROC curve to identify thresholds may be a low-precision and somewhat arbitrary operation, which may not replicate well from one study to the next. A specific potential problem with over-reliance on the ROC curve may be that it tempts analysts to always identify thresholds for binary test-classification even if such hard-thresholds do not really exist. The field of decision theory may provide more reliable protocols for segmenting, clustering of classifying cases into groups or categories.  
+
Example: Hypothyroidism Diagnosis Using T4 Levels  
 +
The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.
  
====Introduction of ROC Curve====
 
[[SMHS_PowerSensitivitySpecificity|Sensitivity and specificity]] are both characteristics of a test but they also depend on the definition of what constitutes an abnormal test. Consider a diagnostic medical test where the cut-points would without doubt influence the test results. We can use the hypothyroidism data from the likelihood ratio to illustrate how these two characteristics change depending on the choice of T4 level that defines hypothyroidism. Recall the data where patients with suspected hypothyroidism are reported.
 
 
<center>
 
<center>
{|class="wikitable" style="text-align:center; width:75%" border="1"
+
{|class="wikitable" style="text-align:center; width:80%"
 +
|+ Frequency of T4 Levels in Hypothyroid vs. Euthyroid Patients
 
|-
 
|-
|T4||<1||1-2||2-3||3-4||4-5||5-6||6-7||7-8||8-9||9-10||10-11||11-12||>12
+
! Group !! <1 !! 1–2 !! 2–3 !! 3–4 !! 4–5 !! 5–6 !! 6–7 !! 7–8 !! 8–9 !! 9–10 !! 10–11 !! >11
 
|-
 
|-
|Hypothyroid||2||3||1||8||4||4||3||3||1||0||2||1||0
+
| Hypothyroid || 2 || 3 || 1 || 8 || 4 || 4 || 3 || 3 || 1 || 0 || 2 || 1
 
|-
 
|-
|Euthyroid||0||0||0||0||1||6||11||19||17||20||11||4||4
+
| Euthyroid || 0 || 0 || 0 || 0 || 1 || 6 || 11 || 19 || 17 || 20 || 11 || 8
 
|}
 
|}
 
</center>
 
</center>
  
With the following cut-points, we have the data listed:
+
We compute performance metrics at different thresholds:
<center>
+
 
{|class="wikitable" style="text-align:center; width:75%" border="1"
+
* Cut-point = 5 (strict):
|-
+
 
|T4 value||Hypothyroid||Euthyroid
+
<math>T4 \leq 5 \to \text{ test positive }, </math>
|-
+
 
|5 or less||18||1
+
<math>\text{Sensitivity} = 18/32 = 0.56,</math>
|-
+
|5.1 - 7||7||17
+
<math>\text{Specificity} = 92/93 = 0.99.</math>
|-
+
 
|7.1 - 9||4||36
+
* Cut-point = 7 (moderate):
|-
+
 
|9 or more||3||39
+
<math>\text{Sensitivity} = 25/32 = 0.78,</math>
|-
+
 
|Totals:||32||93
+
<math>\text{Specificity} = 75/93 = 0.81.</math>
|}
+
 
</center>
+
* Cut-point = 9 (lenient): 
 +
 
 +
<math>\text{Sensitivity} = 29/32 = 0.91,</math> 
 +
 
 +
<math>\text{Specificity} = 39/93 = 0.42.</math>
 +
 
 +
Plotting Sensitivity (y-axis) versus FPR = <math>1 - \text{Specificity}</math> (x-axis) for all thresholds yields the ROC curve.
 +
 
 +
=== Applications and Interpretation ===
 +
==== Area Under the Curve (AUC) ====
 +
The AUC provides a standardized measure of overall diagnostic accuracy:
 +
 
 +
* 0.90–1.00: Excellent 
 +
* 0.80–0.90: Good 
 +
* 0.70–0.80: Fair 
 +
* 0.60–0.70: Poor 
 +
* 0.50–0.60: Fail (no better than chance)
  
 +
In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.
  
Suppose that patients with T4 of 5 or less are considered to be hypothyroid, then the data would be displayed as the following and the sensitivity ($TP/(TP+FN) =18/32$) is 0.56 and specificity ($TN/(TN+FP)=92/93$) is 0.99 in this case.
+
==== Optimal Threshold Selection and Cost Analysis ====
<center>
+
While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|T4 value||Hypothyroid||Euthyroid
 
|-
 
|5 or less||18||1
 
|-
 
|More than 5||14||92
 
|-
 
|Totals:||32||93
 
|}
 
</center>
 
Now suppose, we decided to be less stringent on the disease and consider the patients with T4 values of 7 or less to be hypothyroid, then the data would be recorded as the following and the sensitivity in this case would be 0.78 and specificity 0.81:
 
<center>
 
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|T4 value||Hypothyroid||Euthyroid
 
|-
 
|7 or less||25||18
 
|-
 
|More than 7||7||75
 
|-
 
|Totals:||32||93
 
|}
 
</center>
 
If we move the cut-point for hypothyroidism a little bit higher, say 9, we would have the sensitivity of 0.91 and specificity 0.42:
 
<center>
 
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|T4 value||Hypothyroid||Euthyroid
 
|-
 
|9 or less||29||54
 
|-
 
|More than 9||3||39
 
|-
 
|Totals:||32||93
 
|}
 
</center>
 
To sum up, we have the pairs of sensitivity an specificity with corresponding cut-points in the following table:
 
<center>
 
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|Cut points||Sensitivity||Specificity
 
|-
 
|5||0.56||0.99
 
|-
 
|7||0.78||0.81
 
|-
 
|9||0.91||0.42
 
|}
 
</center>
 
From the table above, we observe that the sensitivity improves with increasing cut-point T4 value while specificity increases with decreasing cut-point T4 value. That is a tradeoff between sensitivity and specificity. The table above can also be shown as TP and FP.
 
<center>
 
{|class="wikitable" style="text-align:center; width:75%" border="1"
 
|-
 
|Cut points||Rate of True Positives|| Rate of False Positives
 
|-
 
|5||0.56||0.01
 
|-
 
|7||0.78||0.19
 
|-
 
|9||0.91||0.58
 
|}
 
</center>
 
  
Plotting the sensitivity vs. (1-specificity), we get the ROC curve:
+
The slope method identifies the optimal point on the ROC curve where the tangent slope equals:
x<-c(0,0.01,0.19,0.4,0.58,1)
+
<math>\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}</math>.
y<-c(0,0.56,0.78,0.81,0.91,1)
 
plot(x,y,type='o',main='ROC curve for T4',xlab='False positive rate (1-specificity)',ylab='True positive rate (sensitivity)')
 
 
install.packages("pROC")
 
library("pROC")
 
HypothyroidResponse <- c(2,3,1,8,4,4,3,3,1,0,2,1,0)
 
PredictorThreshold <- c('poor','good','good','poor','good','poor','poor','poor','good','good','poor','good','good')
 
# Syntax (response, predictor)
 
auc(factor(PredictorThreshold), HypothyroidResponse)
 
# the AUC=P(random positive example > random negative example)
 
 
> Area under the curve: 0.8214
 
  
<center>
+
* If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is <math>1/8</math>.
[[File:ROC Fig 1.png|500px]]
+
* If treatment risks make FP 2× more costly, the target slope is 2.
</center>
 
  
*ROC interpretation: The area under the ROC Curve can be used to classify the accuracy of a diagnostic test according to the following academic point system:
+
This approach balances clinical priorities with statistical performance.
:: 0.90-1: excellent;
 
:: 0.8-0.9: good;
 
:: 0.7-0.8: fair;
 
:: 0.6-0.7: poor;
 
:: 0.5-0.7:fail.
 
With our example above, the area under the T4 ROC curve is 0.86, which shows that the accuracy of the test is good in separating hypothyroid from euthyroid patients.
 
  
*Computing ROC: The calculation protocol of the area under the ROC curve is important. The area measures discrimination, which is the ability of the test to correctly classify those with and without the disease. The area under the curve is the percentage of randomly drawn pairs for which this is true. Two methods are commonly used to calculate the area of the scope:
+
=== Practical Implementation in R ===
:: non-parametric method based on constructing trapezoids under the curve as approximation of area;
+
Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.
:: a parametric method using a [[SMHS_CIs#Maximum_likelihood_estimation_.28MLE.29|MLE]] to fit a smooth curve to data points.
 
  
===Applications===
+
==== Example 1: Basic ROC Analysis for T4 Test ====
 +
<pre>
 +
# Install and load required package
 +
if (!require("pROC")) install.packages("pROC")
 +
library(pROC)
  
We can use the [[SOCR_Data_July2009_ID_NI#Curvedness_Data|smaller summary Alzheimer's disease (AD) dataset of Global Gray Matter Volume (GMV) and Cortical Surface Curvedness morphometry measures]] to illustrate the concepts of false-positive, false-negative, sensitivity, specificity and power of a test.
+
# Simulated data based on T4 distribution
 +
response <- c(rep(1, 32), rep(0, 93))  # 1 = Hypothyroid, 0 = Euthyroid
  
Suppose we try to determine the ''best'' threshold of the GMV or the curvedness (a measure of cortical surface smoothness) that can be used as a test for AD. We would combine NC+MCI and explore the effects of varying the GMV threshold on the false-positive and false-negative rates. The table below uses these simple formulas to compute the number of false-positive ($=COUNTIF(GMVCurvednessTable!GMV29:GMV105, "<="\&A2)$, count non-AD cases with GMV$\le$threshold) and false-negative ($=COUNTIF(GMVCurvednessTable!GMV2:GMV28, ">"\&A2)$, count AD cases with GMV>threshold) frequencies for each threshold between $35\times 10^4$ and $75\times 10^4$ (in increments of $20,000$).
+
# Simulated T4 values (lower = more likely diseased)
 +
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
 +
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
 +
predictor <- c(predictor_pos, predictor_neg)
  
<center>
+
# Build ROC object (higher predictor = less likely diseased)
{| class="wikitable" style="text-align:center;" border="2"
+
roc_obj <- roc(response, predictor, direction = ">")
|-
 
! Threshold||AD_Test_GMV<Threshold_FalsePositives||FalseNegatives
 
|-
 
|350000||0||26
 
|-
 
|370000||0||26
 
|-
 
|390000||1||25
 
|-
 
|410000||3||23
 
|-
 
|430000||5||23
 
|-
 
|450000||7||23
 
|-
 
|470000||11||20
 
|-
 
|490000||23||18
 
|-
 
|510000||39||15
 
|-
 
|530000||47||13
 
|-
 
|550000||56||13
 
|-
 
|570000||64||13
 
|-
 
|590000||68||12
 
|-
 
|610000||70||8
 
|-
 
|630000||71||7
 
|-
 
|650000||75||7
 
|-
 
|670000||76||5
 
|-
 
|690000||76||3
 
|-
 
|710000||77||2
 
|-
 
|730000||77||1
 
|-
 
|750000||77||0
 
|}
 
</center>
 
  
The figure below illustrates graphically the effect of increasing the GMV threshold (binary classification of cases into AD group or NC+MCI group) on the false-positive (red) and false-negative (blue) classification rates.
+
# Plot ROC curve with AUC
<center>
+
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)
[[Image:SMHS_ROC_Fig10.png|600px]]
 
</center>
 
  
We can use a similar R-script to compute the number of:
+
# Identify optimal threshold (Youden index)
* false-positive, counting non-AD cases with GMV ≤ threshold, and
+
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))
* false-negative, counting AD cases with GMV > threshold, for each threshold between $35×10^4$ and $75×10^4$ (in increments of $20,000$).
+
</pre>
  
First save the table: [[SOCR_Data_July2009_ID_NI#Curvedness_Data| Alzheimer's disease (AD) dataset of Global Gray Matter Volume (GMV) and Cortical Surface Curvedness morphometry measures]], as CSV file (C:\User\GMV.csv) on your hard-drive.  
+
==== Example 2: Comparing Classifiers for Alzheimer’s Disease ====
 +
This example compares [https://socr.umich.edu/DSPA2/DSPA2_notes/06_ML_NN_SVM_RF_Class.html Random Forest (machine learning decision/prediction)] and [https://socr.umich.edu/DSPA2/DSPA2_notes/11_FeatureSelection.html#216_Logistic_Transformation Logistic Regression models] using [https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data Global Gray Matter Volume (GMV) and demographic features].
  
dataset <- read.csv('C:\\User\\GMV.csv')
+
<pre>
attach(dataset)
+
# Load required libraries
summary(GMV)
+
if (!require("randomForest")) install.packages("randomForest")
nonAD <- subset(dataset,Group!='AD')
+
if (!require("ROCR")) install.packages("ROCR")
threshold <- seq(350000,750000,by=20000)
+
if (!require("pROC")) install.packages("pROC")
calculate <- function(thres)
+
if (!require("caret")) install.packages("caret")
{
+
library(randomForest)
  false.p <- sum(as.numeric(nonAD$\$ $GMV<=thres))
+
library(ROCR)
  false.n <- sum(as.numeric(nonAD$\$ $GMV>thres))
+
library(pROC)
  cnt <- c(false.p,false.n)
+
library(caret)
  return(cnt)
+
library("XML"); library("xml2"); library("rvest")
}
 
result_FP_FN <- apply(as.matrix(threshold,,1),1,calculate)
 
rownames(result_FP_FN) <- c('False Positive','False Negative')
 
colnames(result_FP_FN) <- c(as.character(threshold))
 
result_FP_FN
 
threshold
 
# result_FP_FN[,4] # 4th column of matrix
 
# result_FP_FN[3,] # 3rd row of matrix
 
# result_FP_FN[2:4,1:5] # rows 2,3,4 of columns 1,2,3,4,5
 
result_FP_FN[1,]
 
result_FP_FN[2,]
 
 
# transpose and print the FP_FN matrix
 
t(result_FP_FN)
 
  
We can dig a bit deeper by plotting the impact of thresholding the global GMV on the false-positive (FP) and false-negative (FN) rates. The R-script below generates this figure, where the thresholds ($\times 10^4$) are used as labels for each point in the FP vs. FN scatter plot. If committing FN errors is $R$ times more expensive than committing FP errors, then the '''best operating point''' (hence ROC) will be tangent to the FP-FN curve/line with a slope of $-R$. Therefore, if $R=\frac{1}{8}$, we should set the threshold to 53 (as the slope of the green line will be $\frac{13-14}{48-40}=-\frac{1}{8}$) and if $R=2$, the threshold should be 41 (as the slope of the purple line will be $\frac{23-25}{3-2}=-\frac{2}{1}=-2$):
+
# Load data
 +
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
 +
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])
  
par(mfrow = c(1,1))
+
# Clean and preprocess
+
dataset_clean <- subset(dataset, Group %in% c("AD", "NC"))
leg.txt <- c("FP vs. FN", "Thresholds", "Line Slope=-1/8", "Line Slope = -2")
+
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD"))
+
# Convert to binary: 1 for AD (positive class), 0 for NC
Threshold <- c(35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75)
+
dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0)
FalsePositives <-c(0,0,1,3,5,7,11,23,39,47,56,64,68,70,71,75,76,76,77,77,77)
 
FalseNegatives <- c(26,26,25,23,23,23,20,18,15,13,13,13,12,8,7,7,5,3,2,1,0)
 
plot(FalsePositives, FalseNegatives, type="b", pch=21, col="red",  lty=1,
 
xlab="False-Positives", ylab="FalseNegatives",  
 
main = "Impact of Thresholding the Global GMV on the FP vs. FN")
 
 
linex1 <- c(40, 48)
 
liney1 <- c(14, 13)
 
linez1 <- lm(liney1 ~linex1)
 
abline (linez1, col="green")
 
linex2 <- c(3, 2)
 
liney2 <- c(23, 25)
 
linez2 <- lm(liney2 ~linex2)
 
abline (linez2, col="purple")
 
# lines(linex2, liney2, ol=" purple ",lwd=2.5)
 
 
text(FalsePositives, FalseNegatives, Threshold, cex=0.6, pos=1, col="blue")
 
legend(list(x = 58,y = 29), legend = leg.txt, col=c("red", "blue", "green", "purple"),
 
#col = c(1,3,1,2),
 
lty = 1, merge = TRUE)
 
  
<center>
+
# Ensure predictors are correct type
[[Image:SMHS_ROC_Fig11.png|600px]]
+
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
</center>
+
dataset_clean$Age <- as.numeric(dataset_clean$Age)
 +
dataset_clean$Sex <- as.factor(dataset_clean$Sex)
  
====R AUC/ROC Example====
+
# Remove NAs
Below is an example using the ROCR R-package to generate the ROC curve and compute the AUC. First download the data table: [[SOCR_Data_July2009_ID_NI#Curvedness_Data|Alzheimer's disease (AD) dataset of Global Gray Matter Volume (GMV) and Cortical Surface Curvedness morphometry measures]], as CSV file (C:\User\GMV.csv) on your hard-drive.
+
dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")])
  
#install.packages("randomForest")
+
# Check class balance
#library("randomForest")
+
cat("Class distribution:\n")
 +
print(table(dataset_clean$Group_binary))
 +
cat("\nTotal observations:", nrow(dataset_clean), "\n")
  
dataset <- read.csv('C:\\Users\\GMV.csv')
+
# Set seed for reproducibility
# remove the TBV, or any other variable
+
set.seed(123)
# subset(dataset, select=-c(TBV))
 
# train the random forest model
 
AD.rf <-randomForest(Group ~ MMSE+CDR+Sex+Age+TBV+GMV+WMV+CSFV+Curvedness, ntree=100, keep.forest=TRUE, importance=TRUE)
 
AD.rf.predict <- predict(AD.rf, dataset, type="prob")[,2]
 
plot(performance(prediction(AD.rf.predict, dataset$\$ $Group),"tpr","fpr"),col = "blue")
 
auc_RF <- performance(prediction(AD.rf.predict, dataset$\$ $Group),"auc")$@$y.values[[1]]
 
legend("topleft",legend=c(paste(
 
  "Random Forest Prediction Model (AUC=",formatC(auc_RF,digits=4,format="f"),
 
    ")",sep="")), col=c("blue"), lty=1)
 
  abline(a=0,b=1,lwd=2,lty=2,col="gray")
 
  
# if you need to remove some cases/rows - example, binarize the classes
+
# Split data into training and testing (70/30)
dataset <- dataset[(dataset$\$ $Group != "MCI"),]
+
train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean))
dataset$\$ $Group <- factor(dataset$\$ $Group)
+
train_data <- dataset_clean[train_idx, ]
+
test_data <- dataset_clean[-train_idx, ]
# General Model: fit <- glm(Group~., dataset, family=binomial)
 
fitGLM <- glm(Group~ GMV + WMV, dataset, family=binomial)
 
# predict the linear model fit using the real data
 
train.predictGLM <- predict(fitGLM, newdata = dataset, type="response")    
 
library(ROCR)
 
plot(performance(prediction(train.predictGLM, dataset$\$ $Group),"tpr","fpr"),col = "red")
 
auc_value <- performance(prediction(train.predictGLM, dataset$\$ $Group),"auc")$@$y.values[[1]]
 
legend("bottomright",legend=c(paste(
 
  "Logistic Regression Model Fit(AUC=",formatC(auc_value,digits=4,format="f"),
 
    ")",sep="")), col=c("red"), lty=1)
 
  abline(a=0,b=1,lwd=2,lty=2,col="gray")
 
  
 +
# Random Forest model (with all features including Sex)
 +
rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex,
 +
                        data = train_data,
 +
                        ntree = 500,
 +
                        importance = TRUE)
  
: Note that in this case we only illustrated the ROC curve construction for the GMV variable, however similar analyses can be done for some of the [[SOCR_Data_July2009_ID_NI#Curvedness_Data|other variables in this dataset (e.g., MMSE, CDR, TBV, GMV, WMV, CSFV, Curvedness)]].
+
# Logistic regression model (with all features)
 +
glm_model <- glm(Group_binary ~ GMV + Age + Sex,  
 +
                data = train_data,  
 +
                family = binomial(link = "logit"))
  
* [http://pubs.rsna.org/doi/abs/10.1148/radiology.143.1.7063747 This article] titled The Meaning And the Use Of The Area Under A Receiver Operating Characteristic (ROC) Curve presented a representation and interpretation of the area under a ROC curve obtained by the ‘rating’ method, or by mathematical predictions based on patient characteristics. It showed that that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject.
+
# Get predictions on TEST data (not training data)
 +
rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"] # Probability of class 1 (AD)
 +
glm_probs <- predict(glm_model, test_data, type = "response")
  
* [http://www.sciencedirect.com/science/article/pii/S0001299878800142 This article] illustrated practical experimental techniques for measuring ROC curves and discussed about the issues of case selection and curve-fitting. It also talked about possible generalizations of conventional ROC analysis to account for decision performance in complex diagnostic tasks and showed ROC analysis related in direct and natural way to cost/benefit analysis of diagnostic decision making. This paper developed the concepts of ‘average diagnostic cost’ and ‘average net benefit’ to identify the optimal compromise among various kinds of diagnostic error and suggested ways in ROC analysis to optimize diagnostic strategies.
+
# Check prediction distributions
 +
cat("\nPrediction summary:\n")
 +
cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n")
 +
cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n")
  
===Software===
+
# ROC curves using ROCR package (corrected)
 +
pred_rf <- prediction(rf_probs, test_data$Group_binary)
 +
perf_rf <- performance(pred_rf, "tpr", "fpr")
 +
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
  
# With the given example in R:
+
pred_glm <- prediction(glm_probs, test_data$Group_binary)
x<-c(0,0.01,0.19,0.58,1)
+
perf_glm <- performance(pred_glm, "tpr", "fpr")
y<-c(0,0.56,0.78,0.91,1)
+
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
plot(x,y,type='o',main='ROC curve for T4',xlab='False positive rate (specificity)',ylab='True positive rate (sensitivity)')
 
  
===Problems===
+
# Alternative: Use pROC for better diagnostics and plotting
 +
roc_rf <- roc(test_data$Group_binary, rf_probs)
 +
roc_glm <- roc(test_data$Group_binary, glm_probs)
  
6.1) Suppose that a new study is conducted on lung cancer and the following data is collected in identify between two types of lung cancers (say type a and type b). Conduct the ROC curve for this example by varying the cut-points from 2 to 10 by increasing 2 units each time. Calculate the area under the curve and interpret on the accuracy of the test.
+
# Comparative plot with BOTH ROCR and pROC approaches
 +
par(mfrow = c(1, 2))
  
<center>
+
# Plot 1: Using ROCR
{|class="wikitable" style="text-align:center; width:75%" border="1"
+
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)")
|-
+
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
|measurements||<1||1-2||2-3||3-4||4-5||5-6||6-7||7-8||8-9||9-10||10-11||11-12||>12
+
abline(a = 0, b = 1, lty = 2, col = "gray")
|-
+
legend("bottomright",
|Type a||2||1||4||2||8||7||4||3||0||0||1||2||2
+
      legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
|-
+
                  paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
|Type b||1||3||0||2||2||5||10||23||18||20||15||8||2
+
      col = c("blue", "red"), lwd = 2, cex = 0.8)
|}
 
</center>
 
6.2) When a serious disease can be treated if it is caught early, it is more important to have a test with high specificity than high sensitivity.
 
  
a. True
+
# Plot 2: Using pROC (often more robust)
 +
plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)")
 +
lines(roc_glm, col = "red", lwd = 2)
 +
legend("bottomright",
 +
      legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"),
 +
                  paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")),
 +
      col = c("blue", "red"), lwd = 2, cex = 0.8)
  
b. False
+
# Reset plot layout
 +
par(mfrow = c(1, 1))
  
6.3) The positive predictive value of a test is calculated by dividing the number of:
+
# Print model performance metrics
 +
cat("\n=== Model Performance ===\n")
 +
cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n")
 +
cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n")
 +
cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n")
 +
cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n")
  
(a) True positives in the population
+
# If AUC is still below 0.5, flip predictions
 +
if (auc_rf < 0.5) {
 +
  cat("\nNote: RF AUC < 0.5. Flipping predictions...\n")
 +
  rf_probs <- 1 - rf_probs
 +
  pred_rf <- prediction(rf_probs, test_data$Group_binary)
 +
  auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
 +
  cat("Corrected RF AUC:", round(auc_rf, 3), "\n")
 +
}
  
(b) True negatives in the population
+
if (auc_glm < 0.5) {
 +
  cat("Note: GLM AUC < 0.5. Flipping predictions...\n")
 +
  glm_probs <- 1 - glm_probs
 +
  pred_glm <- prediction(glm_probs, test_data$Group_binary)
 +
  auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
 +
  cat("Corrected GLM AUC:", round(auc_glm, 3), "\n")
 +
}
  
(c) People who test positive
+
# Final improved plot with corrected AUCs
 +
plot(perf_rf, col = "blue", lwd = 2,
 +
    main = "ROC Comparison: Random Forest vs Logistic Regression",
 +
    xlab = "False Positive Rate",
 +
    ylab = "True Positive Rate")
 +
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
 +
abline(a = 0, b = 1, lty = 2, col = "gray")
  
(d) People who test negative
+
# Fixed Legend
 +
legend("bottomright",
 +
      legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"),
 +
                  paste("GLM (AUC =", round(auc_glm, 3), ")")),
 +
      col = c("blue", "red"),
 +
      lwd = 2,
 +
      cex = 0.75,          # Reduced size (0.9 was likely too big)
 +
      bty = "n",          # "n" removes the box entirely for a cleaner look
 +
      inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes
 +
      y.intersp = 0.8)     # Tightens the vertical spacing between lines
 +
</pre>
  
6.4) A new screening test has been developed for diabetes. The table below represents the results of the new test compared to the current gold standard.
+
=== Problems ===
 +
==== Problem 6.1: ROC Construction ====
 +
A study evaluates a biomarker for distinguishing lung cancer subtypes:
  
 
<center>
 
<center>
{|class="wikitable" style="text-align:center; width:75%" border="1"
+
{|class="wikitable" style="text-align:center; width:50%"
 
|-
 
|-
| ||Condition positive||Condition negative||Total
+
! Biomarker Range !! Type A (Positive) !! Type B (Negative)
 
|-
 
|-
|Test positive||80||70||150
+
| < 2 || 3 || 4
 
|-
 
|-
|Test negative||10||240||250
+
| 2–4 || 6 || 2
 
|-
 
|-
|Total||90||310||400
+
| 4–6 || 15 || 7
 +
|-
 +
| 6–8 || 7 || 33
 +
|-
 +
| > 8 || 1 || 38
 
|}
 
|}
 
</center>
 
</center>
What is the sensitivity of the test?
 
  
(a) 77%
+
* Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
 +
* Compute the AUC and interpret whether the test is clinically useful.
  
(b) 89%
+
==== Problem 6.2: Clinical Application ====
 +
True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.
  
(c) 80%
+
Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.
  
(d) 53%
+
==== Problem 6.3: Definitions ====
 +
The Positive Predictive Value (PPV) is calculated as: 
 +
* (a) True Positives / Total Population 
 +
* (b) True Negatives / (True Negatives + False Positives) 
 +
* (c) True Positives / (True Positives + False Positives) 
 +
* (d) True Negatives / Test Negatives 
  
 +
Answer: (c). PPV is the probability that a person with a positive test truly has the disease.
  
6.5) What is the specificity of the test?
+
==== Problems 6.4–6.6: Performance Metrics ====
 +
A new diabetes test yields:
  
(a) 77%
+
<center>
 
+
{|class="wikitable" style="text-align:center;"
(b) 89%
+
|-
 
+
! !! Disease Present !! Disease Absent !! Total
(c) 80%
+
|-
 
+
| Test Positive || 80 || 70 || 150
(d) 53%
+
|-
 
+
| Test Negative || 10 || 240 || 250
6.6) What is the positive predictive value of the test?
+
|-
 
+
| Total || 90 || 310 || 400
(a) 77%
+
|}
 
+
</center>
(b) 89%
 
  
(c) 80%
+
* 6.4 Sensitivity: <math>80/90 \approx 89\%</math> 
 +
* 6.5 Specificity: <math>240/310 \approx 77\%</math> 
 +
* 6.6 PPV: <math>80/150 \approx 53\%</math>
  
(d) 53%
+
=== References ===
  
 +
* [https://sda.statisticalcomputing.org/learning See the SOCR SDA ROC/AUC Learning Module].
 +
* Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36. 
 +
* Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298. 
 +
* SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset 
 +
* pROC and ROCR R package documentation
  
===References===
+
<hr>
 +
* SOCR Home page: https://socr.umich.edu
  
*[http://gim.unmc.edu/dxtests/ROC1.htm  Introduction to ROC Curves]
+
{{translate|pageName=https://wiki.socr.umich.edu/index.php?title=SMHS_ROC}}
*[http://gim.unmc.edu/dxtests/roc2.htm  Plotting and Intrepretting an ROC Curve]
 
*[http://gim.unmc.edu/dxtests/roc3.htm  The Area Under an ROC Curve]
 
*[http://en.wikipedia.org/wiki/Receiver_operating_characteristic  ROC Wikipedia]
 

Latest revision as of 09:42, 23 February 2026

Scientific Methods for Health Sciences - Receiver Operating Characteristic (ROC) Curve

Overview

The Receiver Operating Characteristic (ROC) curve is a fundamental graphical tool used to evaluate the performance of a binary classifier system as its discrimination threshold varies. It illustrates the diagnostic ability of a classifier by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible threshold settings.

By visualizing these trade-offs, the ROC curve aids in selecting optimal models and discarding suboptimal ones. The Area Under the Curve (AUC) serves as a single scalar measure of aggregate diagnostic performance.

Motivation

In binary classification tasks—such as diagnosing Disease vs. No Disease—outcomes are often determined by whether a continuous test statistic falls above or below a chosen cutoff. While sensitivity and specificity describe accuracy at a single threshold, classifier performance changes as this threshold shifts.

Key objectives of ROC analysis include:

  • Visualizing trade-offs: Demonstrating the dynamic relationship between sensitivity (TPR) and specificity (\(1 - \text{FPR}\)).
  • Assessing accuracy: The closer the curve hugs the top-left corner of the ROC space, the better the test. A curve along the 45° diagonal represents random guessing (no discriminative ability).
  • Selecting optimal thresholds: The slope of the tangent to the ROC curve at a given point reflects the likelihood ratio, enabling threshold selection based on clinical costs or benefits.
  • Summarizing performance: The AUC quantifies the probability that the classifier ranks a randomly chosen positive instance higher than a randomly chosen negative one.

Theory

The Confusion Matrix and Core Metrics

A binary classifier produces four possible outcomes based on a decision threshold:

True Condition (Gold Standard)
Disease (Positive) No Disease (Negative)
Test Result Positive True Positive (TP)
(Hit)
False Positive (FP)
(Type I Error, \(\alpha\))
Negative False Negative (FN)
(Type II Error, \(\beta\))
True Negative (TN)
(Correct Rejection)

Fundamental metrics derived from this matrix:

  • Sensitivity (True Positive Rate)\[\text{Sensitivity} = \frac{TP}{TP + FN}\]
  • Specificity (True Negative Rate)\[\text{Specificity} = \frac{TN}{TN + FP}\]
  • False Positive Rate\[FPR = 1 - \text{Specificity} = \frac{FP}{TN + FP}\]

Constructing the ROC Curve

To construct an ROC curve, sensitivity and specificity are computed for every feasible cutoff value of the diagnostic test.

Example: Hypothyroidism Diagnosis Using T4 Levels The table below shows the distribution of T4 measurements in hypothyroid (diseased) and euthyroid (non-diseased) individuals.

Frequency of T4 Levels in Hypothyroid vs. Euthyroid Patients
Group <1 1–2 2–3 3–4 4–5 5–6 6–7 7–8 8–9 9–10 10–11 >11
Hypothyroid 2 3 1 8 4 4 3 3 1 0 2 1
Euthyroid 0 0 0 0 1 6 11 19 17 20 11 8

We compute performance metrics at different thresholds:

  • Cut-point = 5 (strict)\[T4 \leq 5 \to \text{ test positive }, \]

\(\text{Sensitivity} = 18/32 = 0.56,\)

\(\text{Specificity} = 92/93 = 0.99.\)

  • Cut-point = 7 (moderate)\[\text{Sensitivity} = 25/32 = 0.78,\]

\(\text{Specificity} = 75/93 = 0.81.\)

  • Cut-point = 9 (lenient)\[\text{Sensitivity} = 29/32 = 0.91,\]

\(\text{Specificity} = 39/93 = 0.42.\)

Plotting Sensitivity (y-axis) versus FPR = \(1 - \text{Specificity}\) (x-axis) for all thresholds yields the ROC curve.

Applications and Interpretation

Area Under the Curve (AUC)

The AUC provides a standardized measure of overall diagnostic accuracy:

  • 0.90–1.00: Excellent
  • 0.80–0.90: Good
  • 0.70–0.80: Fair
  • 0.60–0.70: Poor
  • 0.50–0.60: Fail (no better than chance)

In the hypothyroidism example, the AUC is 0.86, indicating good discriminative ability.

Optimal Threshold Selection and Cost Analysis

While AUC summarizes global performance, clinical decisions require a single operating threshold. The optimal choice depends on the relative costs of false positives (FP) and false negatives (FN).

The slope method identifies the optimal point on the ROC curve where the tangent slope equals\[\text{Slope} = \frac{\text{Cost}(FP) \times P(\text{Negative})}{\text{Cost}(FN) \times P(\text{Positive})}\].

  • If missing a disease (FN) is 8× more costly than a false alarm (FP), the target slope is \(1/8\).
  • If treatment risks make FP 2× more costly, the target slope is 2.

This approach balances clinical priorities with statistical performance.

Practical Implementation in R

Modern ROC analysis leverages R packages such as `pROC` and `ROCR` for robust computation and visualization.

Example 1: Basic ROC Analysis for T4 Test

# Install and load required package
if (!require("pROC")) install.packages("pROC")
library(pROC)

# Simulated data based on T4 distribution
response <- c(rep(1, 32), rep(0, 93))  # 1 = Hypothyroid, 0 = Euthyroid

# Simulated T4 values (lower = more likely diseased)
predictor_pos <- c(rep(4, 18), rep(6, 7), rep(8, 4), rep(10, 3))
predictor_neg <- c(rep(4, 1), rep(6, 17), rep(8, 36), rep(10, 39))
predictor <- c(predictor_pos, predictor_neg)

# Build ROC object (higher predictor = less likely diseased)
roc_obj <- roc(response, predictor, direction = ">")

# Plot ROC curve with AUC
plot(roc_obj, main = "ROC Curve for T4", col = "blue", print.auc = TRUE)

# Identify optimal threshold (Youden index)
coords(roc_obj, "best", ret = c("threshold", "specificity", "sensitivity"))

Example 2: Comparing Classifiers for Alzheimer’s Disease

This example compares Random Forest (machine learning decision/prediction) and Logistic Regression models using Global Gray Matter Volume (GMV) and demographic features.

# Load required libraries
if (!require("randomForest")) install.packages("randomForest")
if (!require("ROCR")) install.packages("ROCR")
if (!require("pROC")) install.packages("pROC")
if (!require("caret")) install.packages("caret")
library(randomForest)
library(ROCR)
library(pROC)
library(caret)
library("XML"); library("xml2"); library("rvest")

# Load data
wiki_url <- read_html("https://wiki.socr.umich.edu/index.php/SOCR_Data_July2009_ID_NI#Curvedness_Data")
dataset <- html_table(html_nodes(wiki_url, "table")[[2]])

# Clean and preprocess
dataset_clean <- subset(dataset, Group %in% c("AD", "NC"))
dataset_clean$Group <- factor(dataset_clean$Group, levels = c("NC", "AD"))
# Convert to binary: 1 for AD (positive class), 0 for NC
dataset_clean$Group_binary <- ifelse(dataset_clean$Group == "AD", 1, 0)

# Ensure predictors are correct type
dataset_clean$GMV <- as.numeric(dataset_clean$GMV)
dataset_clean$Age <- as.numeric(dataset_clean$Age)
dataset_clean$Sex <- as.factor(dataset_clean$Sex)

# Remove NAs
dataset_clean <- na.omit(dataset_clean[, c("Group_binary", "GMV", "Age", "Sex")])

# Check class balance
cat("Class distribution:\n")
print(table(dataset_clean$Group_binary))
cat("\nTotal observations:", nrow(dataset_clean), "\n")

# Set seed for reproducibility
set.seed(123)

# Split data into training and testing (70/30)
train_idx <- sample(1:nrow(dataset_clean), 0.7 * nrow(dataset_clean))
train_data <- dataset_clean[train_idx, ]
test_data <- dataset_clean[-train_idx, ]

# Random Forest model (with all features including Sex)
rf_model <- randomForest(as.factor(Group_binary) ~ GMV + Age + Sex, 
                         data = train_data, 
                         ntree = 500,
                         importance = TRUE)

# Logistic regression model (with all features)
glm_model <- glm(Group_binary ~ GMV + Age + Sex, 
                 data = train_data, 
                 family = binomial(link = "logit"))

# Get predictions on TEST data (not training data)
rf_probs <- predict(rf_model, test_data, type = "prob")[, "1"]  # Probability of class 1 (AD)
glm_probs <- predict(glm_model, test_data, type = "response")

# Check prediction distributions
cat("\nPrediction summary:\n")
cat("RF probabilities range:", round(range(rf_probs, na.rm = TRUE), 3), "\n")
cat("GLM probabilities range:", round(range(glm_probs, na.rm = TRUE), 3), "\n")

# ROC curves using ROCR package (corrected)
pred_rf <- prediction(rf_probs, test_data$Group_binary)
perf_rf <- performance(pred_rf, "tpr", "fpr")
auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])

pred_glm <- prediction(glm_probs, test_data$Group_binary)
perf_glm <- performance(pred_glm, "tpr", "fpr")
auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])

# Alternative: Use pROC for better diagnostics and plotting
roc_rf <- roc(test_data$Group_binary, rf_probs)
roc_glm <- roc(test_data$Group_binary, glm_probs)

# Comparative plot with BOTH ROCR and pROC approaches
par(mfrow = c(1, 2))

# Plot 1: Using ROCR
plot(perf_rf, col = "blue", lwd = 2, main = "ROC Curves (ROCR Package)")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc_rf, 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Plot 2: Using pROC (often more robust)
plot(roc_rf, col = "blue", lwd = 2, main = "ROC Curves (pROC Package)")
lines(roc_glm, col = "red", lwd = 2)
legend("bottomright", 
       legend = c(paste("Random Forest (AUC =", round(auc(roc_rf), 3), ")"),
                  paste("Logistic Reg (AUC =", round(auc(roc_glm), 3), ")")),
       col = c("blue", "red"), lwd = 2, cex = 0.8)

# Reset plot layout
par(mfrow = c(1, 1))

# Print model performance metrics
cat("\n=== Model Performance ===\n")
cat("Random Forest AUC (ROCR):", round(auc_rf, 3), "\n")
cat("Logistic Regression AUC (ROCR):", round(auc_glm, 3), "\n")
cat("Random Forest AUC (pROC):", round(auc(roc_rf), 3), "\n")
cat("Logistic Regression AUC (pROC):", round(auc(roc_glm), 3), "\n")

# If AUC is still below 0.5, flip predictions
if (auc_rf < 0.5) {
  cat("\nNote: RF AUC < 0.5. Flipping predictions...\n")
  rf_probs <- 1 - rf_probs
  pred_rf <- prediction(rf_probs, test_data$Group_binary)
  auc_rf <- as.numeric(performance(pred_rf, "auc")@y.values[[1]])
  cat("Corrected RF AUC:", round(auc_rf, 3), "\n")
}

if (auc_glm < 0.5) {
  cat("Note: GLM AUC < 0.5. Flipping predictions...\n")
  glm_probs <- 1 - glm_probs
  pred_glm <- prediction(glm_probs, test_data$Group_binary)
  auc_glm <- as.numeric(performance(pred_glm, "auc")@y.values[[1]])
  cat("Corrected GLM AUC:", round(auc_glm, 3), "\n")
}

# Final improved plot with corrected AUCs
plot(perf_rf, col = "blue", lwd = 2, 
     main = "ROC Comparison: Random Forest vs Logistic Regression",
     xlab = "False Positive Rate", 
     ylab = "True Positive Rate")
plot(perf_glm, col = "red", lwd = 2, add = TRUE)
abline(a = 0, b = 1, lty = 2, col = "gray")

# Fixed Legend
legend("bottomright", 
       legend = c(paste("RF (AUC =", round(auc_rf, 3), ")"),
                  paste("GLM (AUC =", round(auc_glm, 3), ")")),
       col = c("blue", "red"), 
       lwd = 2, 
       cex = 0.75,          # Reduced size (0.9 was likely too big)
       bty = "n",           # "n" removes the box entirely for a cleaner look
       inset = c(0.02, 0.02), # Nudges the legend slightly away from the axes
       y.intersp = 0.8)     # Tightens the vertical spacing between lines

Problems

Problem 6.1: ROC Construction

A study evaluates a biomarker for distinguishing lung cancer subtypes:

Biomarker Range Type A (Positive) Type B (Negative)
< 2 3 4
2–4 6 2
4–6 15 7
6–8 7 33
> 8 1 38
  • Task: Construct the ROC curve using cut-points at 2, 4, 6, and 8 (Type A = positive).
  • Compute the AUC and interpret whether the test is clinically useful.

Problem 6.2: Clinical Application

True or False: When screening for a serious, treatable disease (e.g., early-stage cancer), it is generally more important to have a test with high specificity than high sensitivity.

Answer: False. For treatable but serious diseases, high sensitivity is prioritized to minimize false negatives (missed cases). Specificity can be improved in follow-up confirmatory testing.

Problem 6.3: Definitions

The Positive Predictive Value (PPV) is calculated as:

  • (a) True Positives / Total Population
  • (b) True Negatives / (True Negatives + False Positives)
  • (c) True Positives / (True Positives + False Positives)
  • (d) True Negatives / Test Negatives

Answer: (c). PPV is the probability that a person with a positive test truly has the disease.

Problems 6.4–6.6: Performance Metrics

A new diabetes test yields:

Disease Present Disease Absent Total
Test Positive 80 70 150
Test Negative 10 240 250
Total 90 310 400
  • 6.4 Sensitivity\[80/90 \approx 89\%\]
  • 6.5 Specificity\[240/310 \approx 77\%\]
  • 6.6 PPV\[80/150 \approx 53\%\]

References

  • See the SOCR SDA ROC/AUC Learning Module.
  • Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*. 1982;143(1):29–36.
  • Metz CE. Basic principles of ROC analysis. *Seminars in Nuclear Medicine*. 1978;8(4):283–298.
  • SOCR Data: Global Gray Matter Volume (GMV) Alzheimer’s Dataset
  • pROC and ROCR R package documentation



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif