As the increasingly restrictive models estimate the same parameters as in the less constrained models, they are nested within the comparison model and have more degrees of freedom. Hence, goodness of fit may be compared statistically by comparing whether the increase in chi-squares is significantly larger than the increase in degrees of freedom. As these tests were based on the rescaled Satorra-Bentler chi-square values, these difference tests were adjusted for non-normality according to instructions by Satorra and Bentler (2001). If significant differences emerged, post-hoc analyses on a factor or an item level were performed to identify the source of misfit. The delta (change) values for the RMSEA and the CFI were also presented. However, the substantive meaning of these is harder to interpret in invariance testing. A simulation study by Chen (2007) indicated ΔRMSEA and ΔCFI values higher than .0139 and -.0030 for loadings, .0124 and -.0038 for intercepts, and .0118 and -.0032 for residuals might be considered as significant. However, as these values were based on eight indicator models (one factor), which is a bit different than the models compared here, we put more weight on the S-B chi-square different tests.
lisrel 8.8 full version free 174
Equivalence of test score reliability was not supported, as the SB χ2 difference test was significant (model M4 was worse than M3). The ΔCFI also exceeded the desired amount, although the ΔRMSEA was minor. The modification indices were used to identify items showing a significant difference in item score reliabilities between the groups. Seven error variances had to be freed up to achieve invariance between the groups (model 4a). As equivalence in score reliability is a rather stringent test of equivalence and very seldom completely supported in psychological measures, the percentage of items (7 of 33: 21 %) causing non-invariance was considered small.
Scalar invariance is the most stringent test of invariance by demanding all estimated intercepts for the latent scale equal. Support of scalar invariance makes direct comparisons of observed mean score values across countries possible. As expected, it was not supported as evidenced by a significant worsening in the SB χ2 difference test (M5 was worse than M4a). The ΔCFI also confirmed a considerable worsening, although the ΔRMSEA was minor. Non-invariant items were identified by again checking the modification indices. Twenty-two items had to be freed up in order to achieve invariance (M5a not different from M4a). This indicates that observed RSA subscale mean score differences between the countries are confounded mainly by different intercepts and partly by different measurement errors. 2ff7e9595c
Comments