Difference between revisions of "SMHS BigDataBigSci SEM sem vs cfa"
(Created page with "== Structural Equation Modeling (SEM) Example 2 (Parkinson’s Disease Data) - sem() vs. cfa() === The function <b>sem()</b> is very similar t...") |
(→See also) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | ==[[SMHS_BigDataBigSci_SEM_Ex2| Structural Equation Modeling (SEM) Example 2 (Parkinson’s Disease Data)]] - sem() vs. cfa() | + | ==[[SMHS_BigDataBigSci_SEM_Ex2| Structural Equation Modeling (SEM) Example 2 (Parkinson’s Disease Data)]] - sem() vs. cfa() == |
The function <b>sem()</b> is very similar to the function <b>cfa()</b>. As we did not include <b>fit.measures=TRUE</b>, the report only includes the basic chi-square test statistic. The argument <b>standardized=TRUE</b>, reports standardized parameter values. Two extra columns of standardized parameter values are printed. In the first column (labeled, Std.lv), only the latent variables are standardized. In the second column (labeled Std.all), both latent and observed variables are standardized. The latter is often called the `completely standardized solution'. | The function <b>sem()</b> is very similar to the function <b>cfa()</b>. As we did not include <b>fit.measures=TRUE</b>, the report only includes the basic chi-square test statistic. The argument <b>standardized=TRUE</b>, reports standardized parameter values. Two extra columns of standardized parameter values are printed. In the first column (labeled, Std.lv), only the latent variables are standardized. In the second column (labeled Std.all), both latent and observed variables are standardized. The latter is often called the `completely standardized solution'. | ||
Line 104: | Line 104: | ||
==See also== | ==See also== | ||
* [[SMHS_BigDataBigSci_SEM| Back to Structural Equation Modeling (SEM)]] | * [[SMHS_BigDataBigSci_SEM| Back to Structural Equation Modeling (SEM)]] | ||
− | * [[ | + | * [[SMHS_BigDataBigSci_SEM_Ex2| Back to SEM Example 2: Parkinson’s Disease Data]] |
* [[SMHS_BigDataBigSci_GCM| Next: (Latent) Growth Curve Modeling (GCM)]] | * [[SMHS_BigDataBigSci_GCM| Next: (Latent) Growth Curve Modeling (GCM)]] | ||
Latest revision as of 12:06, 5 May 2016
Structural Equation Modeling (SEM) Example 2 (Parkinson’s Disease Data) - sem() vs. cfa()
The function sem() is very similar to the function cfa(). As we did not include fit.measures=TRUE, the report only includes the basic chi-square test statistic. The argument standardized=TRUE, reports standardized parameter values. Two extra columns of standardized parameter values are printed. In the first column (labeled, Std.lv), only the latent variables are standardized. In the second column (labeled Std.all), both latent and observed variables are standardized. The latter is often called the `completely standardized solution'.
# remove all objects from the R console (current workspace) rm(list = ls())
# library("lavaan")
#load data 05_PPMI_top_UPDRS_Integrated_LongFormat1.csv ( dim(myData) 1764 31 ), long format # myData <- read.csv("https://umich.instructure.com/files/330397/download?download_frd=1&verifier=3bYRT9FXgBGMCQv8MNxsclWnMgodiJRYo3ODFtDq",header=TRUE) # dichotomize the "ResearchGroup" variable
myData$\$$ResearchGroup <- ifelse(myData$\$$ResearchGroup == "Control", 1, 0)
# library("MASS") myData2<-scale(myData); head(myData2) myData2[, 20] <- myData$\$$ResearchGroup; # replace the ResearchGroup by orig binary labels (do not rescale) attach(myData2) head(myData2) # model3 <- ' # latent variable definitions - defining how the latent variables are “manifested by” a set of observed # (or manifest) variables, aka “indicators” # (1) Measurement Model # Imaging =~ L_cingulate_gyrus_ComputeArea + cerebellum_Volume Imaging =~ R_insular_cortex_ComputeArea + R_insular_cortex_Volume DemoGeno =~ Weight+Sex+Age # UPDRS =~ UPDRS_Part_I_Summary_Score_Baseline+X_Assessment_Non.Motor_Geriatric_Depression_Scale_GDS_Short_Summary_Score_Baseline UPDRS =~ UPDRS_part_I +UPDRS_part_II + UPDRS_part_III # (2) Regressions ResearchGroup ~ Imaging + DemoGeno + UPDRS ' Here is why we needed to scale the data first before we fit the SEM model: # using raw data provides unreliable estimates <font color="red"># some observed variances are (at least) a factor 1000 times larger than others;</font> fit3 <- sem(model3, data=<b>myData</b>, estimator="MLM") summary(fit3) # using scaled data provides stable estimates fit3 <- sem(model3, data=myData2, estimator="MLM") summary(fit3) # report the standardized coefficients of the model standardizedSolution(fit3) # variation explained by model components (The R-square value for all endogenous variables) inspect(fit3, "r2") Note that all variances are supposed to be positive, however, occasionally, model estimates may generate a residual variance that is negative. This may happen due to random sampling error where a very small true value may sometimes produce a negative estimate, or it can occur when the model is unstable. # Inspect the fitted values variance-covariance matrix: fitted(fit3)$\$$cov
# Model fitting Name Command fit CFA to data cfa(model, data=Data) fit SEM to data sem(model, data=Data) standardized solution sem(model, data=Data, std.ov=TRUE) orthogonal factors cfa(model, data=Data, orthogonal=TRUE) Matrices Name Command Factor covariance matrix inspect(fit, "coefficients")$\$$psi Fitted covariance matrix fitted(fit)$\$$cov Observed covariance matrix inspect(fit, 'sampstat')$\$$cov Residual covariance matrix resid(fit)$\$$cov Factor correlation matrix cov2cor(inspect(fit, "coefficients")$\$$psi) or use covariance command with standardised solution e.g., cfa(..., std.ov=TRUE) Fit Measures <b>Name Command</b> Fit measures: fitMeasures(fit) Specific fit measures e.g.: fitMeasures(fit)[c('chisq', 'df', 'pvalue', 'cfi', 'rmsea', 'srmr')] Parameters <b>Name Command</b> Parameter information parTable(fit) Standardised estimates standardizedSolution(fit) or summary(fit, standardized=TRUE) R-squared | inspect(fit, 'r2') Compare models <b>Name Command</b> Compare fit measures cbind(m1=inspect(m1_fit, 'fit.measures'), m2=inspect(m2_fit, 'fit.measures')) Chi-square difference test anova(m1_fit, m2_fit) Model improvement <b>Name Command</b> Modification indices mod_ind <- modificationindices(fit) 10 greatest head(mod_ind[order(mod_ind$\$$mi, decreasing=TRUE), ], 10) mi > 5 subset(mod_ind[order(mod_ind$\$$mi, decreasing=TRUE), ], mi > 5)
# to account for groupings (gender) # fit3 <- sem(model3, data=myData, group="Sex", estimator="MLM")
# install.packages("semPlot") library(semPlot)
# Plot standardized model (numerical): # semPaths(fit3, what = "est", layout = "tree", title = TRUE, style = "LISREL") semPaths(fit3)
See also
- Back to Structural Equation Modeling (SEM)
- Back to SEM Example 2: Parkinson’s Disease Data
- Next: (Latent) Growth Curve Modeling (GCM)
- SOCR Home page: http://www.socr.umich.edu
Translate this page: