Wow, I have a blog?

Almost three years have passed since I last updated this blog, a long time!
What happened in the meantime? Well, I have been working a bit in the domain of aesthetics and thought a little bit about Egon Brunswik’s probabilistic functionalism among other things.

Tomorrow, I am leaving for Vienna to get some feedback with regards to an R-package.

Well, feedback from friends with regard to this blog was more like “I don’t understand a word!” or “Can’t you just post pictures???”.

Ok. Photography got neglected by me for quite some time, but here are some more recent shots I like.

This is a shot of Marzahn. Now, everyone seems to have an opinion about Marzahn. If you get a chance, try it out by yourself.


This is Stefan, a good friend. He is a sysadmin. Apart from that, he has a good eye for forms and loves taking high-contrast black and white macro shots of plants on film.


The botanical garden of Berlin is a good place if you are interested in plants. Sometimes it makes me feel like being in a space station.


And this is me in front of a painting by Hieronymus Bosch taken by Lem.

JAGS as a tool for simulation studies

In one of my last posts I hinted to the fact that JAGS could be used for simulation studies. Let’s have a look! So basically simulation studies are very useful if one wants to check if a model one is developing makes sense. The main aim is to assess, if the parameters one has plugged into a data simulation process based on the model could be reasonably recovered by an estimation process. In JAGS it is possible to simulate data given a model and a set of known parameters and to estimate the parameters given the model and the simulated data set. For today’s example I have chosen a simple psychometric latent variable measurement model from classical test theory, which is very well understood and can be easily estimated with most software capable of latent variable modeling. Here it is:

\mu_{1i}=\lambda_1\cdot\eta_i+\beta_1
\mu_{2i}=\lambda_2\cdot\eta_i+\beta_2
\mu_{3i}= 1\cdot\eta_i+\beta_3.

y_{1i} \sim \mbox{dnorm} (\mu_{1i}, \sigma^2_{e1})
y_{2i} \sim \mbox{dnorm} (\mu_{2i}, \sigma^2_{e2})
y_{3i} \sim \mbox{dnorm} (\mu_{3i}, \sigma^2_{e3})

\eta_i \sim \mbox{dnorm} (0, \sigma^2_{\eta})

The basic idea is that a latent trait \eta is measured by three items (y_{1i} to y_{3i}) with different difficulties (\beta_1 to \beta_3), different loadings (\lambda_1 and \lambda_2) and different measurement error variances (\sigma^2_{e1} to \sigma^2_{e3}). For identification reasons, the loading of item 3 is set to 1. Furher the variance of the trait (\sigma^2_{\eta}) has to be estimated from data.

For simulating data with JAGs we have to specify a data block. The model specification is straight forward, note that we have to hand over the true parameters to JAGS, for example with an R script. This is pretty handy because one could design a thorough simulation study for different sample size and parameter combinations. The estimation part has to be specfied in a model block.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#Simulation
data{
 
  for(i in 1:n)
  {
      y1[i]~dnorm(mu1.true[i], tau.y1.true);
      y2[i]~dnorm(mu2.true[i], tau.y2.true);
      y3[i]~dnorm(mu3.true[i], tau.y3.true);
 
      mu1.true[i]<-lambda1.true*eta.true[i]+beta1.true
      mu2.true[i]<-lambda2.true*eta.true[i]+beta2.true
      mu3.true[i]<-eta.true[i]+beta3.true
  }
 
  tau.y1.true<-pow(sigma.y1.true, -2); 
  tau.y2.true<-pow(sigma.y2.true, -2);
  tau.y3.true<-pow(sigma.y3.true, -2);
 
}
 
# Estimation
model{
 
  for(i in 1:n)
  {
    y1[i]~dnorm(mu1[i], tau.y1);
    y2[i]~dnorm(mu2[i], tau.y2);
    y3[i]~dnorm(mu3[i], tau.y3);
 
    mu1[i]<-lambda1*eta[i]+beta1
    mu2[i]<-lambda2*eta[i]+beta2
    mu3[i]<-eta[i]+beta3
  }
 
  for(i in 1:n)
  {
    eta[i]~dnorm(0,tau.eta);
  }
 
  # Priors for the loadings
  lambda1~dnorm(0,1.0E-3)
  lambda2~dnorm(0,1.0E-3)
  bias_lambda1<-lambda1-lambda1.true
 
  # Priors for the easiness parameters
  beta1~dnorm(0,1.0E-3)
  beta2~dnorm(0,1.0E-3)
  beta3~dnorm(0,1.0E-3)
  bias_beta1<-beta1-beta1.true
 
  # Hyperprior for latent trait distribution
  tau.eta<-pow(sigma.eta,-2)
  sigma.eta~dunif(0,10) 
 
  # Priors for the measurement errors
   tau.y1<-pow(sigma.y1, -2); 
   tau.y2<-pow(sigma.y2, -2);
   tau.y3<-pow(sigma.y3, -2);
   sigma.y1~dunif(0,10);
   sigma.y2~dunif(0,10); 
   sigma.y3~dunif(0,10); 
}

This may look a bit daunting, but it really took only 15 minutes to set that up.

The simulation is controlled by an R script, which hands over the true parameters to JAGS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(R2jags)
library(mcmcplots)
 
n=500 # number of subjects
 
# generate 'true' latent trait values
eta.true<-rnorm(n, 0, 4)
# generate 'true' factor loadings
lambda1.true=1
lambda2.true=0.5
 
beta1.true=1
beta2.true=2
beta3.true=3
 
# generate 'true' error standard deviations
sigma.y1.true=1
sigma.y2.true=0.5
sigma.y3.true=0.75
 
data<-list("n", "eta.true", "lambda1.true", "lambda2.true", "sigma.y1.true", "sigma.y2.true", "sigma.y3.true",
           "beta1.true", "beta2.true", "beta3.true")
 
parameters<-c("lambda1","lambda2", "sigma.eta", "sigma.y1", "sigma.y2", "sigma.y3", 
              "beta1", "beta2", "beta3", "bias_beta1")
 
output<-jags.parallel(data, inits= NULL, parameters, model.file="simple_1f.txt",n.chains=2, n.iter=35000, n.burnin=30000)
attach.jags(output)
denplot(output)
traplot(output)
plot(output)

If we inspect the posterior densities based on one dataset with sample size 500 we see that we pretty much hit the spot. That is, all posterior densities are covering the generating true parameters.
JAGS_sim
Please note that in the estimation code a posterior density for the difference between the data generating parameter \beta_1 and its’ samples from the posterior is specified. The zero is very well covered, but keep in mind that this is based only on one dataset. To replicate a more traditional approach to bias, a viable route could be to simulate several hundreds of datasets and to record for example the posterior mean for each parameter and data set to assess bias.

As always, keep in mind that this is pretty experimental code and estimating latent variable models with JAGS is not very well explored yet as far as I know. However, the main point of this post was to show how JAGS could be used in combination with R to run simulation studies in a probabilistic graphical framework. Should work for multilevel models or generalized mixed models too.

Edit: I just stumbled about this recent blog post by Robert Grand. So stan should work too. Interesting!

Pictures!

This blog looks super technical right now and one may get the impression that I am only munching probability distributions day in and day out! No! I am also munching pizza!

Keep in mind that this is my personal blog so I may also post things that interest me apart from statistics. One of those things are pictures!

There are quite some photos by me online but not so many in which I am depicted. So here is a recent one of me making pizza dough at a friends’ place:

georg_making_pizza _dough

Picture by Martin.

 

Bayesian estimation with JAGS and MPlus: a quick practical comparison

Bayesian estimation techniques were added to Mplus version 7 some time ago. Of course I was interested if MPlus and JAGS come to comparable results when let loose on data. 😉

In particular I was interested in the crossclassified module of MPlus, so I wrote some code to simulate data with R and subsequently recovered the parameters with JAGS and MPlus. Recently I found out how JAGS could also be used as a data simulator for simulation studies, but this is a different topic.

So graphically, the model I used for simulation looks like this:
Diagram1
I have omitted the level-1 residual variances and the variances of the latent distributions in the figure for simplicity reasons. The full model specifications can be gleaned from the source code attached to this post.

To make a long story shot: both approaches, JAGs and MPlus, pleasently come to very comparable results with regard to the parameters. At the moment I am too busy with other things to harp on this more, but I thought it is useful to put the code on the blog in case someone is interested to work on a similar topic in more detail. Thanks to Mr Asparouhov for clarifying some questions with regard to MPlus.

The files:
test.R: code to simulate data from the model, recover the parameters with JAGs, also a bit of convergence diagnostics

test.txt: JAGs model file

cc.inp: MPlus input file

cc.dat: Simulated data. Will be overwritten once the simulation part of test.R is executed.

Get the files here. The code is not very polished but should give an idea how to specify the model in MPlus and JAGs.