Rotational invariance in factor models

Long time no post. Meanwhile I got my Ph.D. and I am still MCMC’ing. In particular I am interested in estimating latent variable models via MCMC with possible extensions to multilevel latent variable models. To get a grip on it I started with defining a simple one-factorial latent variable model and stumbled over the phenomenon of rotational invariance which is described in quite some detail in this techical report by Erosheva and McKay Curtis. In short, the likelihood surface of factor models is multimodal which makes it difficult to estimate this type of model with Monte Carlo methods. If one is using multiple chains they may operate on different modes of the likelihood surface. This phenomenon is also known as pole switching or label switching in latent class analysis or mixture modeling. Erosheva and McKay Curtis offer a solution based on postprocessing the markov chains. I am wondering if this is feasible, if the models become more complicated.

If you want to have a more hands on experience with regard to the problem, have a look at this JAGS model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
model{
  # Likelihood
  # i: Persons
  # k: Items
 
  for(i in 1:n)
  {
    for(j in 1:k)
    {
      mu[i,j]<-beta[j]+lambda[j]*eta[i]
      y[i,j]~dnorm(mu[i,j], tau.y[j])
    }
  }
 
  # Priors
  for(j in 1:k)
  {
  # Item-Residuals
    tau.y[j] <- pow(sigma.y[j], -2)
    sigma.y[j] ~ dunif (0, 100)
  # Intercepts
    beta[j]~ dnorm(0,1.0E-3) 
  # Loadings 
   lambda[j]~dnorm(0,1.0E-3)
  }
 
 
  # Factor-Values
  for(i in 1:n)
  {
  # Variance of latent score distribution fixed to 1
    eta[i]~dnorm(0,1)
  }
 
}

In Psychometrics this model is also known as tau congeneric measurement model for continuous observed variables. Here I have fixed the variance of the latent trait distribution to 1 for identifiability reasons.

The following R-Script simply generates some data from the model with know parameters and hands the data over to JAGS.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
library(R2jags)
 
# Number of Persons
n=100
# Number of Items
k=10
# Difficulties
beta.gen=rnorm(k, mean=0, sd=1)
# Abilities
a<-(rnorm(n, 0, sd=1))
eta.gen<-a-mean(a)
 
# Error standard deviations
sigma.gen=runif(k)*3
y<-matrix(rep(0,n*k), nrow=n, ncol=k)
 
# Produce data
for(i in 1:n)
{
  for(j in 1:k)
  {
    y[i,j]=rnorm(1,mean=beta.gen[j]+eta.gen[i], sd=sigma.gen[j])
  }
}
 
data<-list("y", "n", "k")
 
parameters<-c("beta", "sigma.y", "lambda", "eta")
 
# Starting values
inits <- function(){
list(beta = rnorm(k), lambda=c(rnorm(k)), eta=c(rnorm(n)), sigma.y=runif(k)*100)
}
 
output<-jags(data, inits=NULL, parameters, model.file="1f_lat_var.txt",n.chains=2, n.iter=5000, n.burnin=2500)
 
attach.jags(output)
plot(output)

The main interest here is to check, if the data generating parameters could be reasonably recovered. If you run the code – and if you are lucky – the two employed markov chains will converge to one solution. But it may well be that the two chains will end up operating in two different modes of the likelihood surface. In this case you will see a reflection of the two chains for the loadings and the factor values around the y-axis in the plots. This does not mean that the model is ‘wrong’ or something like that. It simply shows that there are two equivalent solutions. In practice one could simply choose one chain and be happy with that, but if one wants to extend the model by introducing more latent variables or even multilevel structures, the multimodality could be problematic when using MCMC-methods. So, if anyone has a simple solution for dealing with this phenomenon, let me know! Meanwhile I am having a look at pymc.

Leave a Reply

Your email address will not be published. Required fields are marked *