Difference between revisions of "SMHS Usage Rfundamentals HTML Output Example"
(Created page with "== R Fundamentals - Example R-generated HTML Output== The HTML file below is an example of an SMHS_Usage_Rfundamentals|R-output automatically e...") |
(No difference)
|
Revision as of 08:04, 12 April 2016
Contents
- 1 R Fundamentals - Example R-generated HTML Output
- 2 R HTML Export Demo using the SOCR PD Data
- 2.1 Ivo Dinov
- 2.2 April 13, 2016
- 2.3 Data Import from SOCR Data server
- 2.4 Data Preparation (factorization of patient diagnosis)
- 2.5 Multivariate Linear Regression Analysis (predict binary DX)
- 2.6 Variable Selection
- 2.7 Forward Selection
- 2.8 Backward Selection
- 2.9 Reduced Model
- 2.10 Removing Bad Leverage Points
- 2.11 Log Transformation
- 2.12 Multicollinearity Check
- 2.13 Likelihood Ratio Test
- 2.14 Linear Discriminant Analysis
- 2.15 Logistic Regression Analysis
R Fundamentals - Example R-generated HTML Output
The HTML file below is an example of an R-output automatically exported from RStudio using knitr (see details here).
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta name="author" content="Ivo Dinov" /> <meta name="date" content="2016-04-13" /> <title>R HTML Export Demo using the SOCR PD Data</title> <style type="text/css">code{white-space: pre;}</style> <style type="text/css"> pre:not([class]) { background-color: white; } </style> <script type="text/javascript"> if (window.hljs && document.readyState && document.readyState === "complete") { window.setTimeout(function() { hljs.initHighlighting(); }, 0); } </script> </head> <body> <style type="text/css"> .main-container { max-width: 940px; margin-left: auto; margin-right: auto; } code { color: inherit; background-color: rgba(0, 0, 0, 0.04); } img { max-width:100%; height: auto; } </style>
R HTML Export Demo using the SOCR PD Data
Ivo Dinov
April 13, 2016
Data Import from SOCR Data server
<code>library(rvest)</code>
<code>## Warning: package 'rvest' was built under R version 3.2.2</code>
<code>## Loading required package: xml2</code>
<code>## Warning: package 'xml2' was built under R version 3.2.2</code>
<code>wiki <- read_html("http://wiki.socr.umich.edu/index.php/SOCR_Data_PD_BiomedBigMetadata") html_nodes(wiki, "#content")</code>
<code>## {xml_nodeset (1)} ## [1] <div id="content" class="mw-body-primary" role="main">\n\t<a id="top ...</code>
<code>pd <- html_table(html_nodes(wiki,"table")[[1]])</code>
Data Preparation (factorization of patient diagnosis)
<code>## Convert Dx to a dichotomous variable ## pd$Dx <- gsub("PD", 0, pd$Dx) ## pd$Dx <- gsub("HC", 1, pd$Dx) ## pd$Dx <- gsub("SWEDD", 2, pd$Dx) pd$Dx <- as.factor(pd$Dx)</code>
Multivariate Linear Regression Analysis (predict binary DX)
<code>## Full Model m1 <- glm(Dx ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd, family="binomial") summary(m1)</code>
<code>## ## Call: ## glm(formula = Dx ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + ## R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + ## cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + ## L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + ## Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, family = "binomial", ## data = pd) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.3126 -0.9250 0.5125 0.8234 1.9586 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -3.610e+01 1.950e+01 -1.852 0.064086 . ## L_caudate_Volume -2.210e-03 2.666e-03 -0.829 0.407062 ## R_caudate_Volume -2.253e-03 2.226e-03 -1.012 0.311614 ## L_putamen_Volume 3.911e-03 1.985e-03 1.970 0.048863 * ## R_putamen_Volume -4.274e-03 1.305e-03 -3.274 0.001058 ** ## L_hippocampus_Volume 2.417e-03 1.333e-03 1.813 0.069879 . ## R_hippocampus_Volume -1.219e-03 1.160e-03 -1.051 0.293196 ## cerebellum_Volume 5.974e-04 6.215e-04 0.961 0.336450 ## L_lingual_gyrus_Volume 4.474e-04 6.471e-04 0.691 0.489338 ## R_lingual_gyrus_Volume 3.031e-04 6.885e-04 0.440 0.659740 ## L_fusiform_gyrus_Volume 1.896e-03 6.581e-04 2.882 0.003957 ** ## R_fusiform_gyrus_Volume 8.328e-04 7.843e-04 1.062 0.288270 ## Sex 1.668e-01 1.478e-01 1.129 0.259059 ## Weight -2.691e-02 7.195e-03 -3.740 0.000184 *** ## Age -1.520e-02 7.348e-03 -2.069 0.038568 * ## UPDRS_part_I 4.891e-02 1.006e-01 0.486 0.626761 ## UPDRS_part_II 4.854e-02 1.487e-02 3.265 0.001095 ** ## UPDRS_part_III 9.177e-02 1.032e-02 8.894 < 2e-16 *** ## chr12_rs34637584_GT 1.136e+00 1.527e-01 7.438 1.02e-13 *** ## chr17_rs11868035_GT -7.815e-01 1.481e-01 -5.277 1.31e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1467.0 on 1127 degrees of freedom ## Residual deviance: 1204.2 on 1108 degrees of freedom ## AIC: 1244.2 ## ## Number of Fisher Scoring iterations: 4</code>
<code>par(mfrow=c(2,4)) # default layout() call may not be sufficiently large to contain the entire plot, so we extent it plot(m1)</code>
Variable Selection
<code>library(leaps)</code>
<code>## Warning: package 'leaps' was built under R version 3.2.2</code>
<code>library(MASS) reg1 <- regsubsets(Dx ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd) summary(reg1)</code>
<code>## Subset selection object ## Call: regsubsets.formula(Dx ~ L_caudate_Volume + R_caudate_Volume + ## L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + ## R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + ## R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + ## Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, data = pd) ## 19 Variables (and intercept) ## Forced in Forced out ## L_caudate_Volume FALSE FALSE ## R_caudate_Volume FALSE FALSE ## L_putamen_Volume FALSE FALSE ## R_putamen_Volume FALSE FALSE ## L_hippocampus_Volume FALSE FALSE ## R_hippocampus_Volume FALSE FALSE ## cerebellum_Volume FALSE FALSE ## L_lingual_gyrus_Volume FALSE FALSE ## R_lingual_gyrus_Volume FALSE FALSE ## L_fusiform_gyrus_Volume FALSE FALSE ## R_fusiform_gyrus_Volume FALSE FALSE ## Sex FALSE FALSE ## Weight FALSE FALSE ## Age FALSE FALSE ## UPDRS_part_I FALSE FALSE ## UPDRS_part_II FALSE FALSE ## UPDRS_part_III FALSE FALSE ## chr12_rs34637584_GT FALSE FALSE ## chr17_rs11868035_GT FALSE FALSE ## 1 subsets of each size up to 8 ## Selection Algorithm: exhaustive ## L_caudate_Volume R_caudate_Volume L_putamen_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " "*" " " ## 8 ( 1 ) " " "*" " " ## R_putamen_Volume L_hippocampus_Volume R_hippocampus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) "*" " " " " ## 7 ( 1 ) "*" " " " " ## 8 ( 1 ) "*" " " " " ## cerebellum_Volume L_lingual_gyrus_Volume R_lingual_gyrus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " " " " " ## 8 ( 1 ) " " " " " " ## L_fusiform_gyrus_Volume R_fusiform_gyrus_Volume Sex Weight Age ## 1 ( 1 ) " " " " " " " " " " ## 2 ( 1 ) " " " " " " " " " " ## 3 ( 1 ) " " " " " " " " " " ## 4 ( 1 ) " " " " " " " " "*" ## 5 ( 1 ) "*" " " " " " " "*" ## 6 ( 1 ) "*" " " " " " " "*" ## 7 ( 1 ) "*" " " " " " " "*" ## 8 ( 1 ) "*" " " " " "*" "*" ## UPDRS_part_I UPDRS_part_II UPDRS_part_III chr12_rs34637584_GT ## 1 ( 1 ) " " " " " " "*" ## 2 ( 1 ) "*" " " " " "*" ## 3 ( 1 ) "*" " " "*" "*" ## 4 ( 1 ) "*" " " "*" "*" ## 5 ( 1 ) "*" " " "*" "*" ## 6 ( 1 ) "*" " " "*" "*" ## 7 ( 1 ) "*" " " "*" "*" ## 8 ( 1 ) "*" " " "*" "*" ## chr17_rs11868035_GT ## 1 ( 1 ) " " ## 2 ( 1 ) " " ## 3 ( 1 ) " " ## 4 ( 1 ) " " ## 5 ( 1 ) " " ## 6 ( 1 ) " " ## 7 ( 1 ) " " ## 8 ( 1 ) " "</code>
Forward Selection
<code>reg2 <- regsubsets(Dx ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd, method="forward") summary(reg2)</code>
<code>## Subset selection object ## Call: regsubsets.formula(Dx ~ L_caudate_Volume + R_caudate_Volume + ## L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + ## R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + ## R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + ## Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, data = pd, method = "forward") ## 19 Variables (and intercept) ## Forced in Forced out ## L_caudate_Volume FALSE FALSE ## R_caudate_Volume FALSE FALSE ## L_putamen_Volume FALSE FALSE ## R_putamen_Volume FALSE FALSE ## L_hippocampus_Volume FALSE FALSE ## R_hippocampus_Volume FALSE FALSE ## cerebellum_Volume FALSE FALSE ## L_lingual_gyrus_Volume FALSE FALSE ## R_lingual_gyrus_Volume FALSE FALSE ## L_fusiform_gyrus_Volume FALSE FALSE ## R_fusiform_gyrus_Volume FALSE FALSE ## Sex FALSE FALSE ## Weight FALSE FALSE ## Age FALSE FALSE ## UPDRS_part_I FALSE FALSE ## UPDRS_part_II FALSE FALSE ## UPDRS_part_III FALSE FALSE ## chr12_rs34637584_GT FALSE FALSE ## chr17_rs11868035_GT FALSE FALSE ## 1 subsets of each size up to 8 ## Selection Algorithm: forward ## L_caudate_Volume R_caudate_Volume L_putamen_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " "*" " " ## 8 ( 1 ) " " "*" " " ## R_putamen_Volume L_hippocampus_Volume R_hippocampus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) "*" " " " " ## 7 ( 1 ) "*" " " " " ## 8 ( 1 ) "*" " " " " ## cerebellum_Volume L_lingual_gyrus_Volume R_lingual_gyrus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " " " " " ## 8 ( 1 ) " " " " " " ## L_fusiform_gyrus_Volume R_fusiform_gyrus_Volume Sex Weight Age ## 1 ( 1 ) " " " " " " " " " " ## 2 ( 1 ) " " " " " " " " " " ## 3 ( 1 ) " " " " " " " " " " ## 4 ( 1 ) " " " " " " " " "*" ## 5 ( 1 ) "*" " " " " " " "*" ## 6 ( 1 ) "*" " " " " " " "*" ## 7 ( 1 ) "*" " " " " " " "*" ## 8 ( 1 ) "*" " " " " "*" "*" ## UPDRS_part_I UPDRS_part_II UPDRS_part_III chr12_rs34637584_GT ## 1 ( 1 ) " " " " " " "*" ## 2 ( 1 ) "*" " " " " "*" ## 3 ( 1 ) "*" " " "*" "*" ## 4 ( 1 ) "*" " " "*" "*" ## 5 ( 1 ) "*" " " "*" "*" ## 6 ( 1 ) "*" " " "*" "*" ## 7 ( 1 ) "*" " " "*" "*" ## 8 ( 1 ) "*" " " "*" "*" ## chr17_rs11868035_GT ## 1 ( 1 ) " " ## 2 ( 1 ) " " ## 3 ( 1 ) " " ## 4 ( 1 ) " " ## 5 ( 1 ) " " ## 6 ( 1 ) " " ## 7 ( 1 ) " " ## 8 ( 1 ) " "</code>
Backward Selection
<code>reg3 <- regsubsets(Dx ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd, method="backward") summary(reg3)</code>
<code>## Subset selection object ## Call: regsubsets.formula(Dx ~ L_caudate_Volume + R_caudate_Volume + ## L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + ## R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + ## R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + ## Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, data = pd, method = "backward") ## 19 Variables (and intercept) ## Forced in Forced out ## L_caudate_Volume FALSE FALSE ## R_caudate_Volume FALSE FALSE ## L_putamen_Volume FALSE FALSE ## R_putamen_Volume FALSE FALSE ## L_hippocampus_Volume FALSE FALSE ## R_hippocampus_Volume FALSE FALSE ## cerebellum_Volume FALSE FALSE ## L_lingual_gyrus_Volume FALSE FALSE ## R_lingual_gyrus_Volume FALSE FALSE ## L_fusiform_gyrus_Volume FALSE FALSE ## R_fusiform_gyrus_Volume FALSE FALSE ## Sex FALSE FALSE ## Weight FALSE FALSE ## Age FALSE FALSE ## UPDRS_part_I FALSE FALSE ## UPDRS_part_II FALSE FALSE ## UPDRS_part_III FALSE FALSE ## chr12_rs34637584_GT FALSE FALSE ## chr17_rs11868035_GT FALSE FALSE ## 1 subsets of each size up to 8 ## Selection Algorithm: backward ## L_caudate_Volume R_caudate_Volume L_putamen_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " "*" " " ## 8 ( 1 ) " " "*" " " ## R_putamen_Volume L_hippocampus_Volume R_hippocampus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) "*" " " " " ## 7 ( 1 ) "*" " " " " ## 8 ( 1 ) "*" " " " " ## cerebellum_Volume L_lingual_gyrus_Volume R_lingual_gyrus_Volume ## 1 ( 1 ) " " " " " " ## 2 ( 1 ) " " " " " " ## 3 ( 1 ) " " " " " " ## 4 ( 1 ) " " " " " " ## 5 ( 1 ) " " " " " " ## 6 ( 1 ) " " " " " " ## 7 ( 1 ) " " " " " " ## 8 ( 1 ) " " " " " " ## L_fusiform_gyrus_Volume R_fusiform_gyrus_Volume Sex Weight Age ## 1 ( 1 ) " " " " " " " " " " ## 2 ( 1 ) " " " " " " " " " " ## 3 ( 1 ) " " " " " " " " " " ## 4 ( 1 ) " " " " " " " " "*" ## 5 ( 1 ) "*" " " " " " " "*" ## 6 ( 1 ) "*" " " " " " " "*" ## 7 ( 1 ) "*" " " " " " " "*" ## 8 ( 1 ) "*" " " " " "*" "*" ## UPDRS_part_I UPDRS_part_II UPDRS_part_III chr12_rs34637584_GT ## 1 ( 1 ) " " " " " " "*" ## 2 ( 1 ) "*" " " " " "*" ## 3 ( 1 ) "*" " " "*" "*" ## 4 ( 1 ) "*" " " "*" "*" ## 5 ( 1 ) "*" " " "*" "*" ## 6 ( 1 ) "*" " " "*" "*" ## 7 ( 1 ) "*" " " "*" "*" ## 8 ( 1 ) "*" " " "*" "*" ## chr17_rs11868035_GT ## 1 ( 1 ) " " ## 2 ( 1 ) " " ## 3 ( 1 ) " " ## 4 ( 1 ) " " ## 5 ( 1 ) " " ## 6 ( 1 ) " " ## 7 ( 1 ) " " ## 8 ( 1 ) " "</code>
Reduced Model
<code>m2 <- glm(Dx ~ R_caudate_Volume + R_fusiform_gyrus_Volume + Weight + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd, family="binomial") summary(m2)</code>
<code>## ## Call: ## glm(formula = Dx ~ R_caudate_Volume + R_fusiform_gyrus_Volume + ## Weight + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, family = "binomial", ## data = pd) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.1034 -0.9941 0.5432 0.8438 1.9065 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -4.6549832 7.7471401 -0.601 0.547930 ## R_caudate_Volume -0.0024399 0.0021297 -1.146 0.251937 ## R_fusiform_gyrus_Volume 0.0008031 0.0007508 1.070 0.284794 ## Weight -0.0269228 0.0069556 -3.871 0.000109 *** ## UPDRS_part_I 0.0865148 0.0962768 0.899 0.368863 ## UPDRS_part_II 0.0471636 0.0143506 3.287 0.001014 ** ## UPDRS_part_III 0.0860347 0.0099415 8.654 < 2e-16 *** ## chr12_rs34637584_GT 1.1672942 0.1421042 8.214 < 2e-16 *** ## chr17_rs11868035_GT -0.7393041 0.1429901 -5.170 2.34e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1467.0 on 1127 degrees of freedom ## Residual deviance: 1238.5 on 1119 degrees of freedom ## AIC: 1256.5 ## ## Number of Fisher Scoring iterations: 4</code>
<code>par(mfrow=c(2,2)) plot(m2)</code>
Removing Bad Leverage Points
<code>n <- dim(pd)[1] case <- c(1:n) cooks <- cooks.distance(m2) cooks_outlier <- case[cooks>4/(length(pd$Dx)-2)] cooks_outlier</code>
<code>## [1] 653 654 655 656 673 674 675 676 1025 1026 1027 1028</code>
<code>m2 <- glm(Dx ~ R_caudate_Volume + R_fusiform_gyrus_Volume + Weight + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd[-cooks_outlier,], family="binomial") summary(m2)</code>
<code>## ## Call: ## glm(formula = Dx ~ R_caudate_Volume + R_fusiform_gyrus_Volume + ## Weight + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, family = "binomial", ## data = pd[-cooks_outlier, ]) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.1806 -0.9525 0.5194 0.8178 1.8609 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -6.6561223 7.9427555 -0.838 0.402 ## R_caudate_Volume -0.0026930 0.0021848 -1.233 0.218 ## R_fusiform_gyrus_Volume 0.0010330 0.0007694 1.343 0.179 ## Weight -0.0310176 0.0071300 -4.350 1.36e-05 *** ## UPDRS_part_I 0.1332157 0.0987274 1.349 0.177 ## UPDRS_part_II 0.0595924 0.0148204 4.021 5.80e-05 *** ## UPDRS_part_III 0.0947331 0.0103083 9.190 < 2e-16 *** ## chr12_rs34637584_GT 1.2794104 0.1458133 8.774 < 2e-16 *** ## chr17_rs11868035_GT -0.7348668 0.1466157 -5.012 5.38e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1446.8 on 1115 degrees of freedom ## Residual deviance: 1190.5 on 1107 degrees of freedom ## AIC: 1208.5 ## ## Number of Fisher Scoring iterations: 4</code>
<code>par(mfrow=c(2,2)) plot(m2)</code>
Log Transformation
<code>m3 <- glm(Dx ~ log(R_caudate_Volume) + log(R_fusiform_gyrus_Volume) + log(Weight) + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=pd[-cooks_outlier,], family="binomial") summary(m3)</code>
<code>## ## Call: ## glm(formula = Dx ~ log(R_caudate_Volume) + log(R_fusiform_gyrus_Volume) + ## log(Weight) + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + ## chr12_rs34637584_GT + chr17_rs11868035_GT, family = "binomial", ## data = pd[-cooks_outlier, ]) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -2.1723 -0.9456 0.5177 0.8192 1.8614 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -63.35131 71.87608 -0.881 0.378 ## log(R_caudate_Volume) -2.84111 2.19100 -1.297 0.195 ## log(R_fusiform_gyrus_Volume) 10.00512 7.69485 1.300 0.194 ## log(Weight) -2.44090 0.56224 -4.341 1.42e-05 *** ## UPDRS_part_I 0.13423 0.09872 1.360 0.174 ## UPDRS_part_II 0.06010 0.01482 4.054 5.02e-05 *** ## UPDRS_part_III 0.09425 0.01029 9.160 < 2e-16 *** ## chr12_rs34637584_GT 1.28158 0.14584 8.788 < 2e-16 *** ## chr17_rs11868035_GT -0.73351 0.14663 -5.003 5.66e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for binomial family taken to be 1) ## ## Null deviance: 1446.8 on 1115 degrees of freedom ## Residual deviance: 1190.3 on 1107 degrees of freedom ## AIC: 1208.3 ## ## Number of Fisher Scoring iterations: 4</code>
<code>par(mfrow=c(2,2)) plot(m3)</code>
Multicollinearity Check
<code>library(car) vif(m2)</code>
<code>## R_caudate_Volume R_fusiform_gyrus_Volume Weight ## 1.024786 1.053087 1.056551 ## UPDRS_part_I UPDRS_part_II UPDRS_part_III ## 1.019879 1.027360 1.046336 ## chr12_rs34637584_GT chr17_rs11868035_GT ## 1.054836 1.060063</code>
<code>## Here, each of the variance inflaction factors is less than 5, so the multicollinearity between these predictors does not affect the result significantly.</code>
Likelihood Ratio Test
<code>with(m2, null.deviance - deviance)</code>
<code>## [1] 256.336</code>
<code>with(m2, df.null - df.residual)</code>
<code>## [1] 8</code>
<code>with(m2, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))</code>
<code>## [1] 7.811597e-51</code>
<code>## The chi-square of 403 with 8 degrees of freedom and an associated p-value of less than 0.001 indicates that the model as a whole fits significantly better than an empty model.</code>
Linear Discriminant Analysis
<code>## Consider time = 0 only pdzero <- pd[which(pd$Time==0),] ## Split dataset into 70% training and 30% testing set set.seed(666) train_index <- sample(dim(pdzero)[1], trunc(dim(pdzero)[1]*0.7)) train <- pdzero[train_index,] test <- pdzero[-train_index,] library(MASS) lda.fit <- lda(Dx~L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=train) lda.predict <- predict(lda.fit, test) table(lda.predict$class, test$Dx)</code>
<code>## ## HC PD SWEDD ## HC 17 3 8 ## PD 9 22 3 ## SWEDD 7 11 5</code>
Logistic Regression Analysis
<code>## Break up the three-level dependent variable into two binary variables pdzero$PD_YN <- ifelse(pdzero$Dx=="PD", 1, 0) pdzero$SWEDD_YN <- ifelse(pdzero$Dx=="SWEDD", 1, 0) ## Split the dataset into 70% training and 30% testing set set.seed(666) train_index <- sample(dim(pdzero)[1], trunc(dim(pdzero)[1]*0.7)) train <- pdzero[train_index,] test <- pdzero[-train_index,] ## PD_YN as outcome log1 <- glm(PD_YN ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=train, family="binomial") probPD <- predict(log1, test, type="response") median(probPD) # 0.30</code>
<code>## [1] 0.2983791</code>
<code>resultPD <- ifelse(probPD>0.3, 1, 0) table(resultPD, test$PD_YN)</code>
<code>## ## resultPD 0 1 ## 0 30 13 ## 1 19 23</code>
<code>## SWEDD as outcome log2 <- glm(SWEDD_YN ~ L_caudate_Volume + R_caudate_Volume + L_putamen_Volume + R_putamen_Volume + L_hippocampus_Volume + R_hippocampus_Volume + cerebellum_Volume + L_lingual_gyrus_Volume + R_lingual_gyrus_Volume + L_fusiform_gyrus_Volume + R_fusiform_gyrus_Volume + Sex + Weight + Age + UPDRS_part_I + UPDRS_part_II + UPDRS_part_III + chr12_rs34637584_GT + chr17_rs11868035_GT, data=train, family="binomial") probSW <- predict(log2, test, type="response") median(probSW) # 0.301</code>
<code>## [1] 0.3013772</code>
<code>resultSW <- ifelse(probSW>0.2, 1, 0) table(resultSW, test$SWEDD_YN)</code>
<code>## ## resultSW 0 1 ## 0 19 2 ## 1 50 14</code>
<code>## Combine the results of two logistic model together finalpred <- NULL for (i in 1:dim(test)[1]) { if (resultPD[i]==1 & resultSW[i]==0) { finalpred[i] <- "PD" } if (resultPD[i]==1 & resultSW[i]==1) { finalpred[i] <- ifelse(probPD[i]>probSW[i], "PD", "SW") } if (resultPD[i]==0 & resultSW[i]==1) { finalpred[i] <- "SW" } if (resultPD[i]==0 & resultSW[i]==0) { finalpred[i] <- "HC" } } table(finalpred, test$Dx)</code>
<code>## ## finalpred HC PD SWEDD ## HC 3 1 1 ## PD 15 21 3 ## SW 15 14 12</code>
<script> // add bootstrap table styles to pandoc tables $(document).ready(function () { $('tr.header').parent('thead').parent('table').addClass('table table-condensed'); }); </script> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>
- SOCR Home page: http://www.socr.umich.edu
Translate this page: