Difference between revisions of "SMHS MethodsHeterogeneity"
Line 10: | Line 10: | ||
* **** Standard meta-analysis like fixed and random effect models, and tests of heterogeneity, together with various plots and summaries, can be found in the R-package <bn>rmeta</b> (http://cran.r-project.org/web/packages/rmeta). Non-parametric R approaches are included in the <b>np</b> package, http://cran.r-project.org/web/packages/np/vignettes/np.pdf. | * **** Standard meta-analysis like fixed and random effect models, and tests of heterogeneity, together with various plots and summaries, can be found in the R-package <bn>rmeta</b> (http://cran.r-project.org/web/packages/rmeta). Non-parametric R approaches are included in the <b>np</b> package, http://cran.r-project.org/web/packages/np/vignettes/np.pdf. | ||
− | + | ===Methods Summaries=== | |
<b>Overview</b> | <b>Overview</b> | ||
Line 31: | Line 31: | ||
?fdgs | ?fdgs | ||
head(fdgs) | head(fdgs) | ||
+ | |||
+ | <center> | ||
+ | {| class="wikitable" style="text-align:center; " border="1" | ||
+ | |- | ||
+ | | ||ID ||Reg ||Age ||Sex ||HGT ||WGT ||HGT.Z ||WGT.Z | ||
+ | |- | ||
+ | |1 ||100001||West||13.09514||boy||175.5||75.0||1.751||2.410 | ||
+ | |- | ||
+ | |2 ||100003||West||13.81793 ||boy||148.4||40.0||2.292||1.494 | ||
+ | |- | ||
+ | |3 ||100004||West||13.97125||boy||159.9||46.5||0.743||0.783 | ||
+ | |- | ||
+ | |4 ||100005||West||13.98220 ||girl||159.7||46.5 ||0.743 ||0.783 | ||
+ | |- | ||
+ | |5||100006||West||13.52225||girl||160.3||47.8||0.414||0.355 | ||
+ | |- | ||
+ | |6||100018||East||10.21492||boy||157.8||39.7||2.025||0.823 | ||
+ | |} | ||
+ | </center> | ||
+ | |||
+ | summary(fdgs) | ||
+ | summary(fdgs) | ||
+ | |||
+ | <center> | ||
+ | {| class="wikitable" style="text-align:center; " border="1" | ||
+ | |- | ||
+ | |ID ||Reg ||Age ||Sex ||HGT | ||
+ | |- | ||
+ | |Min.:100001||North:732||Min.:0.008214||boy:4829||Min.:46.0 | ||
+ | |- | ||
+ | |1st Qu.:106353||East:2528||1st Qu.:1.618754||girl:5201||1st Qu.:83.8 | ||
+ | |- | ||
+ | |Median:203855||South:2931||Median:8.084873|| ||Median:131.5 | ||
+ | |- | ||
+ | |Mean:180091||West:2578||Mean:8.157936|| ||Mean:123.9 | ||
+ | |- | ||
+ | |3rd Qu.210591||City:1261||3rd Qu.:13.547570|| ||3rd Qu.:162.3 | ||
+ | |- | ||
+ | |Max:401955|| ||Max.:21.993155|| ||Max.:208.0 | ||
+ | |- | ||
+ | | || || || ||NA's: 23 | ||
+ | |} | ||
+ | </center> | ||
+ | |||
+ | (1) Classification Tree | ||
+ | |||
+ | Let's use the data frame fdgs to predict Region, from Age, Height, and Weight. | ||
+ | # grow tree | ||
+ | fit.1 <- rpart(reg ~ age + hgt + wgt, method="class", data= fdgs[,-1]) | ||
+ | |||
+ | printcp(fit.1) # display the results | ||
+ | plotcp(fit.1) # visualize cross-validation results | ||
+ | summary(fit.1) # detailed summary of splits | ||
+ | |||
+ | # plot tree | ||
+ | par(oma=c(0,0,2,0)) | ||
+ | plot(fit.1, uniform=TRUE, margin=0.3, main="Classification Tree for Region (FDGS Data)") | ||
+ | text(fit.1, use.n=TRUE, all=TRUE, cex=1.0) | ||
+ | |||
+ | <center>[[Image:SMHS_Methods2.png|500px]] </center> | ||
+ | |||
+ | # create a better plot of the classification tree | ||
+ | post(fit.1, title = "Classification Tree for Region (FDGS Data)", file = "") | ||
+ | |||
+ | <center>[[Image:SMHS_Methods3.png|500px]] </center> | ||
+ | |||
+ | (2) Pruning the tree | ||
+ | |||
+ | pruned.fit.1<- prune(fit.1, cp= fit.1$\$$cptable[which.min(fit.1$\$$\$$cptable[,"xerror"]),"CP"]) | ||
+ | |||
+ | # plot the pruned tree | ||
+ | plot(pruned.fit.1, uniform=TRUE, main="Pruned Classification Tree for Region (FDGS Data)") | ||
+ | text(pruned.fit.1, use.n=TRUE, all=TRUE, cex=1.0) | ||
+ | post(pruned.fit.1, title = "Pruned Classification Tree for Region (FDGS Data)") | ||
+ | |||
+ | # not much change, as the initial tree is not complex! |
Revision as of 13:27, 7 March 2016
Scientific Methods for Health Sciences - Methods for Studying Heterogeneity of Treatment Effects, Case-Studies of Comparative Effectiveness Research
Adopted from: http://dx.doi.org/10.1186/1471-2288-12-185
- *CART: Classification and regression tree (CART) analysis
- ** LGM/GMM: Latent growth modeling/Growth mixture modeling.
- *** QTE: Quantile Treatment Effect.
- **** Standard meta-analysis like fixed and random effect models, and tests of heterogeneity, together with various plots and summaries, can be found in the R-package <bn>rmeta (http://cran.r-project.org/web/packages/rmeta). Non-parametric R approaches are included in the np package, http://cran.r-project.org/web/packages/np/vignettes/np.pdf.
Methods Summaries
Overview
Recursive partitioning is a data mining technique for exploring structure and patterns in complex data. It facilitates the visualization of decision rules for predicting categorical (classification tree) or continuous (regression tree) outcome variables. The R rpart package provides the tools for Classification and Regression Tree (CART) modeling, conditional inference trees, and random forests. Additional resources include an Introduction to Recursive Partitioning Using the RPART Routines . The Appendix includes description of the main CART analysis steps.
install.packages("rpart") library("rpart")
I. CART (Classification and Regression Tree) is a decision-tree based technique that considers how variation observed in a given response variable (continuous or categorical) can be understood through a systematic deconstruction of the overall study population into subgroups, using explanatory variables of interest. For HTE analysis, CART is best suited for early-stage, exploratory analyses. Its relative simplicity can be powerful in identifying basic relationships between variables of interest, and thus identify potential subgroups for more advanced analyses. The key to CART is its ‘systematic’ approach to the development of the subgroups, which are constructed sequentially through repeated, binary splits of the population of interest, one explanatory variable at a time. In other words, each ‘parent’ group is divided into two ‘child’ groups, with the objective of creating increasingly homogeneous subgroups. The process is repeated and the subgroups are then further split, until no additional variables are available for further subgroup development. The resulting tree structure is oftentimes overgrown, but additional techniques are used to ‘trim’ the tree to a point at which its predictive power is balanced against issues of over-fitting. Because the CART approach does not make assumptions regarding the distribution of the dependent variable, it can be used in situations where other multivariate modeling techniques often used for exploratory predictive risk modeling would not be appropriate – namely in situations where data are not normally distributed.
CART analyses are useful in situations where there is some evidence to suggest that HTE exists, but the subgroups defining the heterogeneous response are not well understood. CART allows for an exploration of response in a myriad of complex subpopulations, and more recently developed ensemble methods (such as Bayesian Additive Regression Trees) allow for more robust analyses through the combination of multiple CART analyses.
Example Fifth Dutch growth study
# Let’s use the Fifth Dutch growth study (2009) fdgs . Is it true that “the world’s tallest nation has stopped growing taller: the height of Dutch children from 1955 to 2009”?
#install.packages("mice") library("mice") ?fdgs head(fdgs)
ID | Reg | Age | Sex | HGT | WGT | HGT.Z | WGT.Z | |
1 | 100001 | West | 13.09514 | boy | 175.5 | 75.0 | 1.751 | 2.410 |
2 | 100003 | West | 13.81793 | boy | 148.4 | 40.0 | 2.292 | 1.494 |
3 | 100004 | West | 13.97125 | boy | 159.9 | 46.5 | 0.743 | 0.783 |
4 | 100005 | West | 13.98220 | girl | 159.7 | 46.5 | 0.743 | 0.783 |
5 | 100006 | West | 13.52225 | girl | 160.3 | 47.8 | 0.414 | 0.355 |
6 | 100018 | East | 10.21492 | boy | 157.8 | 39.7 | 2.025 | 0.823 |
summary(fdgs) summary(fdgs)
ID | Reg | Age | Sex | HGT |
Min.:100001 | North:732 | Min.:0.008214 | boy:4829 | Min.:46.0 |
1st Qu.:106353 | East:2528 | 1st Qu.:1.618754 | girl:5201 | 1st Qu.:83.8 |
Median:203855 | South:2931 | Median:8.084873 | Median:131.5 | |
Mean:180091 | West:2578 | Mean:8.157936 | Mean:123.9 | |
3rd Qu.210591 | City:1261 | 3rd Qu.:13.547570 | 3rd Qu.:162.3 | |
Max:401955 | Max.:21.993155 | Max.:208.0 | ||
NA's: 23 |
(1) Classification Tree
Let's use the data frame fdgs to predict Region, from Age, Height, and Weight.
# grow tree fit.1 <- rpart(reg ~ age + hgt + wgt, method="class", data= fdgs[,-1])
printcp(fit.1) # display the results plotcp(fit.1) # visualize cross-validation results summary(fit.1) # detailed summary of splits
# plot tree par(oma=c(0,0,2,0)) plot(fit.1, uniform=TRUE, margin=0.3, main="Classification Tree for Region (FDGS Data)") text(fit.1, use.n=TRUE, all=TRUE, cex=1.0)
# create a better plot of the classification tree post(fit.1, title = "Classification Tree for Region (FDGS Data)", file = "")
(2) Pruning the tree
pruned.fit.1<- prune(fit.1, cp= fit.1$\$$cptable[which.min(fit.1$\$$\$$cptable[,"xerror"]),"CP"])
# plot the pruned tree plot(pruned.fit.1, uniform=TRUE, main="Pruned Classification Tree for Region (FDGS Data)") text(pruned.fit.1, use.n=TRUE, all=TRUE, cex=1.0) post(pruned.fit.1, title = "Pruned Classification Tree for Region (FDGS Data)")
# not much change, as the initial tree is not complex!