Structural Equation Modelling (SEM) and Multi-group SEM using R
Structural Equation Modeling (SEM) is a multivariate statistical analysis technique that is used to analyze structural relationships among variables. SEM is the combination of factor analysis and multiple regression analysis. Usually factors are created using multiple observed variables through factor analysis. Those factors are called latent variables. Thereafter, multiple regression analysis is performed on latent variables level, not in observed variables level. Let’s assume that you have proper theoretical knowledge about Structural Equation Modeling. If not, it is suggested that you read the books- “Ullman, J.B. and Bentler, P.M., 2003. Structural equation modeling. John Wiley & Sons, Inc.” and “Bollen, K.A. and Curran, P.J., 2006. Latent curve models: A structural equation perspective (Vol. 467). John Wiley & Sons.”
Common softwares for conducting SEM analysis are MPLUS, AMOS, SmartPLS, STATA and R. All the mentioned softwares come with a price but R. R is a free statistical analysis tool and here the codes of doing SEM and multi-group SEM using the ‘lavaan’ package are presented. For detail you may read “Rosseel, Y., 2011. lavaan: an R package for structural equation modeling and more.”
##to load data in R
data<- read.csv(“F:/1. STUDY /WORK in Progress/working data/version7/worldbankdataset16.csv”)
summary(data)
#select observations of required variables without missing values #’na.omit’ for dropping variables with missing values #subset for creating subset of the main datafile
portmodel=na.omit(subset(data, select=c(LPIAT12, LPICQ12, LPIEA12, LPIEC12, LPIFS12, LPIQT12, CT12,LC12, QPI12, PPGDPc12, UNDC)))
summary (portmodel)
#required packages for descriptive stats, normality tests & SEM
library(psych)
library(MVN)
library(lavaan)
library(semPlot)
library(xlsx)
#Descriptive stats
summary(portmodel) #all at once but limited
describe(portmodel$QPI12)
describe(portmodel$LPIAT12)
describe(portmodel$LPICQ12)
describe(portmodel$LPIEA12)
describe(portmodel$LPIEC12)
describe(portmodel$LPIFS12)
describe(portmodel$LPIQT12)
describe(portmodel$CT12)
describe(portmodel$LC12)
describe(portmodel$PPGDPc12)
#normality tests
install.packages(“MVN”) #to install MVN package
library(MVN) #to use MVN package
uniPlot(subset, type=”qqplot”) #creates univariate Q-Q plots
uniPlot(subset, type=”histogram”) #creates univariate histograms
#Linear Regression just to check different relationships
fit = lm(PPGDPc12 ~ CT12+LC12+LPIAT12, data=portmodel)
fit = lm(PPGDPc12 ~ LPIAT12 + CT12+LC12+QPI12+ GERS12 + UP12, data=portmodel)
summary(fit) # show results
corr.test(portmodel$CT12, portmodel$LPIAT12)
#factor reliability check
LPI <- subset(portmodel, select=c(LPIAT12, LPICQ12, LPIEA12, LPIEC12,LPIFS12,LPIQT12))
alpha(LPI) #0.97
ST<-subset(portmodel, select=c(CT12, LC12))
alpha(ST) #0.86
##Confirmatory factor analysis (CFA) model
cfamodel1 <- ‘QPI=~ QPI12
LPI =~ LPIAT12 + LPICQ12 + LPIEA12 + LPIEC12+LPIFS12+LPIQT12
POUT =~ CT12 +LC12
ECON=~PPGDPc12’
#unfortunately I have two latent variables (QPI and ECON) with only one observed variable. Always try to have multiple observed variables.
#CFA results
fit.cfamodel1<- cfa(cfamodel1, data=portmodel, estimator=”MLM”, mimic= “Mplus”,std.lv=T)
#MLM estimation used as data are not normally distributed, if normal use “ML”
summary(fit.cfamodel1, fit.measures=TRUE, standardized=TRUE, rsq=TRUE)
fitMeasures(fit.cfamodel1)
#reliability check
reliability(fit.cfamodel1)
inspect(fit.cfamodel1,”cov.lv”)
inspect(fit.cfamodel1,”cor.lv”)
inspect(fit.cfamodel1,”theta”)
#Structural Equation Model (SEM)
semmodel1 <- ‘LPI =~ LPIAT12 + LPICQ12 + LPIEA12 + LPIEC12 + LPIFS12+ LPIQT12
POUT =~ CT12 +LC12
QPI=~QPI12
PPGDPc=~PPGDPc12
LPI~a*QPI
POUT~b*LPI+c*QPI
PPGDPc~d*POUT+e*QPI+f*LPI
#indirect effect
ab:= a*b
bd:= b*d
cd:= c*d
abd:=a*b*d
LPIEC12 ~~ LPIQT12’ #correlation between error terms
#SEM results and analysis
fit.semmodel1<- sem(semmodel1, data=portmodel, estimator=”MLM”, mimic= “Mplus”)
summary(fit.semmodel1, fit.measures=TRUE, standardized=TRUE, rsq=TRUE)
fitMeasures(fit.semmodel1)
semPaths(fit.semmodel1, whatLabels = “std”, layout=”tree2″, curve = TRUE)
MI<- modindices(fit.semmodel1)
subset(MI, mi > 1.8410)
inspect(fit.semmodel1,”cor.lv”)
inspect(fit.semmodel1,”theta”)
#Further investigations
#to separate latent correlations
pred<- predict(fit.semmodel1)
head(pred)
pred
#creating dataset with the latent factor values
latentdata<-data.frame(pred)
#correlation among latent and observed variables
corrdata<-data.frame(latentdata,portmodel$QPI12,portmodel$PPGDPc12)
cor(corrdata, use =”pairwise.complete.obs”, method=”pearson”)
##Univariate Normality Test
#shapiro-wilk test
shapiro.test(latentdata$LPI)
shapiro.test(latentdata$POUT)
shapiro.test(portmodel$QPI12)
shapiro.test(portmodel$PPGDPc12)
#Multi-group Structural Equation Model (SEM)
#Measurement Invariance check for Multi-group modelling
library(semTools)
measurementInvariance(semmodel2, data=portmodel, estimator=”MLM”,group=”UNDC”, mimic= “Mplus”)
# Multi-group model specification
semmodel2 <- ‘LPI =~ LPIAT12 + LPICQ12 + LPIEA12 + LPIEC12 + LPIFS12+ LPIQT12
POUT =~ CT12 +LC12
LPI~c(a,g)*QPI12
POUT~c(b,h)*LPI+c(c,i)*QPI12
PPGDPc12~c(d,j)*POUT+c(e,k)*QPI12+c(f,l)*LPI
LPIEC12 ~~ LPIQT12
CT12~~c(a,a)*CT12 #c(a,a) to constrain variances to be equal over groups
#indirect effect
bd:=b*d
hj:=h*j
cd:=c*d
ij:=i*j
abd:=a*b*d
ghj:=g*h*j’
#LPIQT12~~c(a,a)*LPIQT12 #to constrain variances to be equal over groups
#LPIQT12~~c(NA,0.001)*LPIQT12 #to fix variance of a particular groups
#Multi-group SEM results
fit.semmodel2<- sem(semmodel2, data=portmodel, group=”UNDC”, group.equal = c(“loadings”), estimator=”MLM”)
summary(fit.semmodel2, fit.measures=TRUE, standardized=TRUE, rsq=TRUE)
inspect(fit.semmodel2,”theta”)
inspect(fit.semmodel2,”cor.lv”)
Some comments:
In the fit.semmodel2, you would notice that I have kept the factor loadings equal across groups while comparing regression co-efficients of the developed and developing country groups. The comment presented below, by Linda K. Muthen explains the reason.
You may consider the following books for learning SEM:
Click to access the published article of this code!
Other published studies using SEM by ResearchHUB members: