JAGS as a tool for simulation studies

In one of my last posts I hinted to the fact that JAGS could be used for simulation studies. Let’s have a look! So basically simulation studies are very useful if one wants to check if a model one is developing makes sense. The main aim is to assess, if the parameters one has plugged into a data simulation process based on the model could be reasonably recovered by an estimation process. In JAGS it is possible to simulate data given a model and a set of known parameters and to estimate the parameters given the model and the simulated data set. For today’s example I have chosen a simple psychometric latent variable measurement model from classical test theory, which is very well understood and can be easily estimated with most software capable of latent variable modeling. Here it is:

\mu_{1i}=\lambda_1\cdot\eta_i+\beta_1
\mu_{2i}=\lambda_2\cdot\eta_i+\beta_2
\mu_{3i}= 1\cdot\eta_i+\beta_3.

y_{1i} \sim \mbox{dnorm} (\mu_{1i}, \sigma^2_{e1})
y_{2i} \sim \mbox{dnorm} (\mu_{2i}, \sigma^2_{e2})
y_{3i} \sim \mbox{dnorm} (\mu_{3i}, \sigma^2_{e3})

\eta_i \sim \mbox{dnorm} (0, \sigma^2_{\eta})

The basic idea is that a latent trait \eta is measured by three items (y_{1i} to y_{3i}) with different difficulties (\beta_1 to \beta_3), different loadings (\lambda_1 and \lambda_2) and different measurement error variances (\sigma^2_{e1} to \sigma^2_{e3}). For identification reasons, the loading of item 3 is set to 1. Furher the variance of the trait (\sigma^2_{\eta}) has to be estimated from data.

For simulating data with JAGs we have to specify a data block. The model specification is straight forward, note that we have to hand over the true parameters to JAGS, for example with an R script. This is pretty handy because one could design a thorough simulation study for different sample size and parameter combinations. The estimation part has to be specfied in a model block.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#Simulation
data{
 
  for(i in 1:n)
  {
      y1[i]~dnorm(mu1.true[i], tau.y1.true);
      y2[i]~dnorm(mu2.true[i], tau.y2.true);
      y3[i]~dnorm(mu3.true[i], tau.y3.true);
 
      mu1.true[i]<-lambda1.true*eta.true[i]+beta1.true
      mu2.true[i]<-lambda2.true*eta.true[i]+beta2.true
      mu3.true[i]<-eta.true[i]+beta3.true
  }
 
  tau.y1.true<-pow(sigma.y1.true, -2); 
  tau.y2.true<-pow(sigma.y2.true, -2);
  tau.y3.true<-pow(sigma.y3.true, -2);
 
}
 
# Estimation
model{
 
  for(i in 1:n)
  {
    y1[i]~dnorm(mu1[i], tau.y1);
    y2[i]~dnorm(mu2[i], tau.y2);
    y3[i]~dnorm(mu3[i], tau.y3);
 
    mu1[i]<-lambda1*eta[i]+beta1
    mu2[i]<-lambda2*eta[i]+beta2
    mu3[i]<-eta[i]+beta3
  }
 
  for(i in 1:n)
  {
    eta[i]~dnorm(0,tau.eta);
  }
 
  # Priors for the loadings
  lambda1~dnorm(0,1.0E-3)
  lambda2~dnorm(0,1.0E-3)
  bias_lambda1<-lambda1-lambda1.true
 
  # Priors for the easiness parameters
  beta1~dnorm(0,1.0E-3)
  beta2~dnorm(0,1.0E-3)
  beta3~dnorm(0,1.0E-3)
  bias_beta1<-beta1-beta1.true
 
  # Hyperprior for latent trait distribution
  tau.eta<-pow(sigma.eta,-2)
  sigma.eta~dunif(0,10) 
 
  # Priors for the measurement errors
   tau.y1<-pow(sigma.y1, -2); 
   tau.y2<-pow(sigma.y2, -2);
   tau.y3<-pow(sigma.y3, -2);
   sigma.y1~dunif(0,10);
   sigma.y2~dunif(0,10); 
   sigma.y3~dunif(0,10); 
}

This may look a bit daunting, but it really took only 15 minutes to set that up.

The simulation is controlled by an R script, which hands over the true parameters to JAGS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(R2jags)
library(mcmcplots)
 
n=500 # number of subjects
 
# generate 'true' latent trait values
eta.true<-rnorm(n, 0, 4)
# generate 'true' factor loadings
lambda1.true=1
lambda2.true=0.5
 
beta1.true=1
beta2.true=2
beta3.true=3
 
# generate 'true' error standard deviations
sigma.y1.true=1
sigma.y2.true=0.5
sigma.y3.true=0.75
 
data<-list("n", "eta.true", "lambda1.true", "lambda2.true", "sigma.y1.true", "sigma.y2.true", "sigma.y3.true",
           "beta1.true", "beta2.true", "beta3.true")
 
parameters<-c("lambda1","lambda2", "sigma.eta", "sigma.y1", "sigma.y2", "sigma.y3", 
              "beta1", "beta2", "beta3", "bias_beta1")
 
output<-jags.parallel(data, inits= NULL, parameters, model.file="simple_1f.txt",n.chains=2, n.iter=35000, n.burnin=30000)
attach.jags(output)
denplot(output)
traplot(output)
plot(output)

If we inspect the posterior densities based on one dataset with sample size 500 we see that we pretty much hit the spot. That is, all posterior densities are covering the generating true parameters.
JAGS_sim
Please note that in the estimation code a posterior density for the difference between the data generating parameter \beta_1 and its’ samples from the posterior is specified. The zero is very well covered, but keep in mind that this is based only on one dataset. To replicate a more traditional approach to bias, a viable route could be to simulate several hundreds of datasets and to record for example the posterior mean for each parameter and data set to assess bias.

As always, keep in mind that this is pretty experimental code and estimating latent variable models with JAGS is not very well explored yet as far as I know. However, the main point of this post was to show how JAGS could be used in combination with R to run simulation studies in a probabilistic graphical framework. Should work for multilevel models or generalized mixed models too.

Edit: I just stumbled about this recent blog post by Robert Grand. So stan should work too. Interesting!

Pictures!

This blog looks super technical right now and one may get the impression that I am only munching probability distributions day in and day out! No! I am also munching pizza!

Keep in mind that this is my personal blog so I may also post things that interest me apart from statistics. One of those things are pictures!

There are quite some photos by me online but not so many in which I am depicted. So here is a recent one of me making pizza dough at a friends’ place:

georg_making_pizza _dough

Picture by Martin.

 

Bayesian estimation with JAGS and MPlus: a quick practical comparison

Bayesian estimation techniques were added to Mplus version 7 some time ago. Of course I was interested if MPlus and JAGS come to comparable results when let loose on data. 😉

In particular I was interested in the crossclassified module of MPlus, so I wrote some code to simulate data with R and subsequently recovered the parameters with JAGS and MPlus. Recently I found out how JAGS could also be used as a data simulator for simulation studies, but this is a different topic.

So graphically, the model I used for simulation looks like this:
Diagram1
I have omitted the level-1 residual variances and the variances of the latent distributions in the figure for simplicity reasons. The full model specifications can be gleaned from the source code attached to this post.

To make a long story shot: both approaches, JAGs and MPlus, pleasently come to very comparable results with regard to the parameters. At the moment I am too busy with other things to harp on this more, but I thought it is useful to put the code on the blog in case someone is interested to work on a similar topic in more detail. Thanks to Mr Asparouhov for clarifying some questions with regard to MPlus.

The files:
test.R: code to simulate data from the model, recover the parameters with JAGs, also a bit of convergence diagnostics

test.txt: JAGs model file

cc.inp: MPlus input file

cc.dat: Simulated data. Will be overwritten once the simulation part of test.R is executed.

Get the files here. The code is not very polished but should give an idea how to specify the model in MPlus and JAGs.

Blogroll

The blog has been neglected for quite some time. At the moment I am using it more or less as a public notepad, but comments are definately invited! However, I added a blogroll.

The book by Gelman and Hill on multilevel hierarchical models helped me quite a bit with regard to the topic of multilevel/hierarchical models and it is still very useful, especially the part about modeling multivariate normal distributions for random effects with MCMC.

Felix maintains quite an active and interesting blog with topics related to science, statistical methods and R.

John R. Kruschke is known for his book on doing Bayesian data analysis, as far as I know one of the first in the field for psychologists. A book review by Wolf Vanpaemel and Francis Tuelinckx appeared in the Journal of Mathematical Psychology.

R-Bloggers is a central hub of content collected from bloggers who write about R.

Rotational Invariance in factor models II

I have been working a bit more on estimating simple factor models with JAGS. I posted the problem from the last post over there at the open discussion forum and had some very helpful input from Vincent Arel-Bundock from Michigan University and one of his co-authors, Walter R. Mebane which ultimately led to the code of the current post.

Here I wanted to know how to estimate a simple factor model for continuous manifest variables and three latent, intercorrelated factors. The code is simple and straight forward. I put a scaled inverse Wishart prior on the variance-covariance matrix of factor loadings, following the approach in Gelman & Hill (2007, p.376). For solving the invariance problem I set the loadings for one item on each factor to 1. In addition, the means of the latent trait distributions (factor values) were all constrained to zero. This is a very traditional approach, that is also implemented in e.g. MPlus, the STATA program GLLAMM or the R package lavaan.

In the following code I use the Holzinger & Swineford dataset from the lavaan tutorial for comparison. First I estimate the model with lavaan and in a second step with JAGS. Both approaches show very comparable results, but keep in mind that this is somewhat experimental. When comparing coefficients please note that I coded the model in such a way that JAGS returns estimated standard deviations and not variances for the manifest items and latent variables.

Ok, enough talk, here is the code. First the R-script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## holzinger.R
# Experimental!
 
library(lavaan)
 
# From the lavaan tutorial
HS.model <- ' visual  =~ x1 + x2 + x3      
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
 
fit <- cfa(HS.model, data=HolzingerSwineford1939, meanstructure=TRUE)
 
summary(fit, standardized=TRUE)
 
library(R2jags)
data<-HolzingerSwineford1939
## Transform data to long format----
#  the complicated way
 
# Dependent variable
y<-c(data$x1, data$x2, data$x3, data$x4, data$x5, data$x6, data$x7, data$x8, data$x9)
 
# Item identifier
item<-c(rep(1, 301), rep(2, 301), rep(3, 301), rep(4, 301), rep(5, 301), rep(6, 301), rep(7, 301), rep(8, 301), rep(9, 301))
 
# Construct identifier (1: visual, 2: textual, 3: speed)
const<-c(rep(1,3*301 ), rep(2,3*301 ), rep(3,3*301 ))
 
# Person id
id<-rep(seq(1:301),9)
## ---------------------------------
# Fitting the model with JAGS
W=(diag(3))
 
dat<-list("y", "item", "const", "id", "W")
parameters<-(c("beta", "sigma.y", "lam", "sigma.eta1", "sigma.eta2", "sigma.eta3", "rho_eta13", "rho_eta12", "rho_eta23", "eta"))
 
output<-jags.parallel(dat, inits=NULL, parameters,  model.file="3fa.txt",n.chains=2, n.iter=3000, n.burnin=2000)
plot(output)

Next the model code for JAGS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
## 3fa.txt
# Experimental!
model{
 
  # Likelihood
  for(i in 1:2709) # Loop over observations
  {
      # Linear decomposition could probably done more
      # elegantly with matrix algebra
      mu[i]<-beta[item[i]] +
         lam[const[i], item[i]]*eta[const[i], id[i]]+
         lam[const[i], item[i]]*eta[const[i], id[i]]+
         lam[const[i], item[i]]*eta[const[i], id[i]]
 
      # Distributional assumptions for the 
      # observed variables
         y[i]~dnorm(mu[i], tau.y[item[i]])
  }
 
  # Priors for the items
    for(i in 1:9)
    {
      beta[i]~dnorm(0,1.0E-3) 
      tau.y[i] <- pow(sigma.y[i], -2)  
      sigma.y[i] ~ dunif (0, 100)   
    }
 
    # Matrix of loadings [construct, item]
    # Constraints and priors 
     lam[1,1]<-1
     lam[2,1]<-0
     lam[3,1]<-0
     lam[1,2]~dnorm(0,1.0E-3) 
     lam[2,2]<-0
     lam[3,2]<-0
     lam[1,3]~dnorm(0,1.0E-3) 
     lam[2,3]<-0
     lam[3,3]<-0
     lam[1,4]<-0
     lam[2,4]<-1
     lam[3,4]<-0
     lam[1,5]<-0
     lam[2,5]~dnorm(0,1.0E-3)
     lam[3,5]<-0
     lam[1,6]<-0
     lam[2,6]~dnorm(0,1.0E-3)
     lam[3,6]<-0
     lam[1,7]<-0
     lam[2,7]<-0
     lam[3,7]<-1
     lam[1,8]<-0
     lam[2,8]<-0
     lam[3,8]~dnorm(0,1.0E-3)
     lam[1,9]<-0
     lam[2,9]<-0
     lam[3,9]~dnorm(0,1.0E-3)
 
    # Scaled inverse Wishart prior for
    # variance-covariance matrix of factor values
    # after Gelman and Hill (2007)
 
    for(i in 1:301)
    {  
      eta[1,i]<-xi.eta1*eta.raw[i,1]
      eta[2,i]<-xi.eta2*eta.raw[i,2]
      eta[3,i]<-xi.eta3*eta.raw[i,3]
      eta.raw[i,1:3]~dmnorm(eta.raw.hat[i,],tau.eta.raw[,])
      eta.raw.hat[i,1]<-mu.eta1.raw
      eta.raw.hat[i,2]<-mu.eta2.raw
      eta.raw.hat[i,3]<-mu.eta3.raw
    }
      mu.eta1<-xi.eta1*mu.eta1.raw
      mu.eta2<-xi.eta2*mu.eta2.raw
      mu.eta3<-xi.eta3*mu.eta3.raw
      mu.eta1.raw<-0
      mu.eta2.raw<-0
      mu.eta3.raw<-0
      xi.eta1~dunif(0,100)
      xi.eta2~dunif(0,100)
      xi.eta3~dunif(0,100)
      tau.eta.raw[1:3,1:3] ~ dwish(W[,],df)
      df<-4
      sigma.eta.raw[1:3,1:3]<-inverse(tau.eta.raw[,])
      sigma.eta1<-xi.eta1*sqrt(sigma.eta.raw[1,1])
      sigma.eta2<-xi.eta2*sqrt(sigma.eta.raw[2,2])
      sigma.eta3<-xi.eta3*sqrt(sigma.eta.raw[3,3])
      rho_eta12<-sigma.eta.raw[1,2]/sqrt(sigma.eta.raw[1,1]*sigma.eta.raw[2,2])
      rho_eta13<-sigma.eta.raw[1,3]/sqrt(sigma.eta.raw[1,1]*sigma.eta.raw[3,3])
      rho_eta23<-sigma.eta.raw[2,3]/sqrt(sigma.eta.raw[2,2]*sigma.eta.raw[3,3])
}

Rotational invariance in factor models

Long time no post. Meanwhile I got my Ph.D. and I am still MCMC’ing. In particular I am interested in estimating latent variable models via MCMC with possible extensions to multilevel latent variable models. To get a grip on it I started with defining a simple one-factorial latent variable model and stumbled over the phenomenon of rotational invariance which is described in quite some detail in this techical report by Erosheva and McKay Curtis. In short, the likelihood surface of factor models is multimodal which makes it difficult to estimate this type of model with Monte Carlo methods. If one is using multiple chains they may operate on different modes of the likelihood surface. This phenomenon is also known as pole switching or label switching in latent class analysis or mixture modeling. Erosheva and McKay Curtis offer a solution based on postprocessing the markov chains. I am wondering if this is feasible, if the models become more complicated.

If you want to have a more hands on experience with regard to the problem, have a look at this JAGS model.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
model{
  # Likelihood
  # i: Persons
  # k: Items
 
  for(i in 1:n)
  {
    for(j in 1:k)
    {
      mu[i,j]<-beta[j]+lambda[j]*eta[i]
      y[i,j]~dnorm(mu[i,j], tau.y[j])
    }
  }
 
  # Priors
  for(j in 1:k)
  {
  # Item-Residuals
    tau.y[j] <- pow(sigma.y[j], -2)
    sigma.y[j] ~ dunif (0, 100)
  # Intercepts
    beta[j]~ dnorm(0,1.0E-3) 
  # Loadings 
   lambda[j]~dnorm(0,1.0E-3)
  }
 
 
  # Factor-Values
  for(i in 1:n)
  {
  # Variance of latent score distribution fixed to 1
    eta[i]~dnorm(0,1)
  }
 
}

In Psychometrics this model is also known as tau congeneric measurement model for continuous observed variables. Here I have fixed the variance of the latent trait distribution to 1 for identifiability reasons.

The following R-Script simply generates some data from the model with know parameters and hands the data over to JAGS.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
library(R2jags)
 
# Number of Persons
n=100
# Number of Items
k=10
# Difficulties
beta.gen=rnorm(k, mean=0, sd=1)
# Abilities
a<-(rnorm(n, 0, sd=1))
eta.gen<-a-mean(a)
 
# Error standard deviations
sigma.gen=runif(k)*3
y<-matrix(rep(0,n*k), nrow=n, ncol=k)
 
# Produce data
for(i in 1:n)
{
  for(j in 1:k)
  {
    y[i,j]=rnorm(1,mean=beta.gen[j]+eta.gen[i], sd=sigma.gen[j])
  }
}
 
data<-list("y", "n", "k")
 
parameters<-c("beta", "sigma.y", "lambda", "eta")
 
# Starting values
inits <- function(){
list(beta = rnorm(k), lambda=c(rnorm(k)), eta=c(rnorm(n)), sigma.y=runif(k)*100)
}
 
output<-jags(data, inits=NULL, parameters, model.file="1f_lat_var.txt",n.chains=2, n.iter=5000, n.burnin=2500)
 
attach.jags(output)
plot(output)

The main interest here is to check, if the data generating parameters could be reasonably recovered. If you run the code – and if you are lucky – the two employed markov chains will converge to one solution. But it may well be that the two chains will end up operating in two different modes of the likelihood surface. In this case you will see a reflection of the two chains for the loadings and the factor values around the y-axis in the plots. This does not mean that the model is ‘wrong’ or something like that. It simply shows that there are two equivalent solutions. In practice one could simply choose one chain and be happy with that, but if one wants to extend the model by introducing more latent variables or even multilevel structures, the multimodality could be problematic when using MCMC-methods. So, if anyone has a simple solution for dealing with this phenomenon, let me know! Meanwhile I am having a look at pymc.

Some short thoughts on MCMC methods

While MCMC is certainly a useful method and has the potential to adress several difficulties that may exist with more traditional methods of parameter estimation, in my opinion it is not a magic bullet for several reasons:

1. The estimation procedure can be slow, especially for larger datasets.
2. The use of prior distributions may deter some researchers.
3. Sometimes it is not clear which prior distribution is appropriate.
4. Model specification in code requires a thorough understanding of the model in question from a formal point of view.

On the other hand, the potential advantages are:

1. Unified handling of a wide scope of models within the framework of graphical models.
2. Unified method for checking a model (posterior predictive checks).
3. Possibility of obtaining posterior distributions of derived statistics, such as effect size indices.
4. Model based handling of missing values and possibility of obtaining posterior distributions of missings in the dependent variable.
5. Possibility of rapid model prototyping.

For building confidence it may be useful to apply the MCMC method to the wide range of models used in the social and behavioral sciences, to cross check against the traditional criteria of parameter estimation (bias, consistency, efficiency) by simulation studies and to examine the influence of the prior distributions on the posterior distributions. But I guess this has already been done. I should check the literature.

The Partial Credit Model as PGM

The last post’s topic about multinomial logistic regression leads very nicely to a wide range of psychometric models for categorial responses, such as the Partial Credit Model (PCM) by Masters (1982). The PCM is a psychometric model that estimates the probability of a person’s response to an item based on characteristics of the item and the person’s latent ability. One particular nice feature of this model is the fact that Andrich’s rating scale model (RSM) and the dichotomous Rasch model (RM) are – as e.g. Mair and Hatzinger (2007) demonstrated – special cases of the PCM. It is fair to say that Samejima seems to have worked on a similar structure back in the sixties.

The focus of this post is on specifying the PCM as a graphical model, estimating the parameters via MCMC and crosschecking the results against the eRm package by Mair and Hatzinger (2007).

Let’s have a look at the formal structure of the PCM first. Starting point is the base distribution of multinomial logistic regression, sometimes called the “softmax activation function” in the machine learning literature:

(1)    \begin{equation*} p(y_i=c)=\frac{\mbox{exp}\left\{ \lambda_{ci}\right\}}{\sum_{c=1}^C\mbox{exp}\left\{ \lambda_{ci}\right\}}. \end{equation*}

The next step is to linearily decompose the parameters \lambda_{ci} with respect to the PCM. For simplicity I will consider the case of four categories per item only. Generalization to C categories is simple and straight forward.

(2)    \begin{eqnarray*} \lambda_{1i}&=&0 \\ \lambda_{2i}&=&\eta_v \cdot 1 - \tau_{j,1}\\ \lambda_{3i}&=&\eta_v \cdot 2 - \tau_{j,1} - \tau_{j,2}\\ \lambda_{4i}&=&\eta_v \cdot 3 - \tau_{j,1} - \tau_{j,2} - \tau_{i,3} \end{eqnarray*}

So, \eta_v is the ability of person v on the latent trait scale, and \tau_{j,c-1} is the so called threshold parameter of item j for the categories c-1 and c. For example, a threshold parameter \tau_{1,1}=-1,8 means that for item 1, given a latent ability of \eta_v=-1.8, the probability of choosing the category scored with 0 is equal to the probability of choosing the category scored with 1. This can be readily verified by for example plotting the item response functions for a set of threshold parameters. To estimate the PCM as a PGM, it is useful to introduce a multilevel perspective with respect to the persons’ abilities:

 \eta_v \sim N(0, \sigma_{\eta}^2).

We assume that the latent abilities are normally distributed with a mean of zero and a variance of \sigma_{\eta}^2, which can be estimated from data. This is actually quite nice, because approaches exist to estimate the separability or reliability based on the variance of the latent trait distribution. Please note that the parametrization of the PCM above is a bit different from the original specification by Masters (1982).

The Bugs-Code for the model is straight forward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# PCM.txt
model{
 
  for (i in 1:n)
  {
 
        lambda[1,i]<-0
        lambda[2,i]<-1*eta[id[i]]-tau[item[i],1]
        lambda[3,i]<-2*eta[id[i]]-tau[item[i],1]-tau[item[i],2]
        lambda[4,i]<-3*eta[id[i]]-tau[item[i],1]-tau[item[i],2]-tau[item[i],3]
 
        Z[i]<-exp(lambda[1,i])+exp(lambda[2,i])+exp(lambda[3,i])+exp(lambda[4,i])
 
        for(c in 1:4)
        {
        p[i,c]<-exp(lambda[c,i])/Z[i]
        }
        y[i]~dcat(p[i,1:4])
    }
 
 
 
# Priors
  for(j in 1:k)
  {
      tau[j,1]~dnorm(0,1.0E-6)
      tau[j,2]~dnorm(0,1.0E-6)
      tau[j,3]~dnorm(0,1.0E-6)  
  }
 
  for(i in 1:N)
  {
    eta[i]~dnorm(0,tau.eta)
  }
 
    tau.eta<-pow(sigma.eta,-2)
    sigma.eta~dunif(0,2)
}

Please note that the data have to be submitted in long format.

Now, let’s do something interesting with this. The following R-script uses the rsmdat example dataset in the eRm-package, transforms it into long format and submits it to JAGS. JAGS generates the posterior distributions the Bayesian way and returns the output to R. Subsequently, the threshold- and ability-parameters are estimated with the pcm()-function in the eRm-package and the results are compared graphically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# PCM.R
library(R2jags)
library(eRm)
 
# Load rsmdat
data<-rsmdat
# Number of thresholds per item
t<-max(data)
# Number of persons
N<-dim(data)[1]
# Number of items
k<-dim(data)[2]
# Transform data into long format
index=1
id<-c(0)
item<-c(0)
y<-c(0)
 
for(i in 1:N)
{
    for(j in 1:k)
    {
      y[index]=data[i,j]+1
      id[index]<-i
      item[index]<-j
      index<-index+1
    }
}
n<-length(y)
 
# Prepare arguments for jags()
data<-list("y","n","N","id", "item", "k")
parameters<-c("tau","eta", "sigma.eta")
inits <- function(){
list(tau = array(rnorm(t*k),c(k,t)), eta=rnorm(N), sigma.eta=runif(1)*2)
}
 
# Fit model
output<-jags(data, inits=inits, parameters, model.file="PCM.txt",
    n.chains=2, n.iter=4000, n.burnin=2000)
 
attach.jags(output)
 
# Checking against eRm ########
# Fit model with eRm
out<-PCM(rsmdat)
# Collect posterior means of threshold parameters 
# from jags output  
t1<-apply(tau[,1,],2,mean)
t2<-apply(tau[,2,],2,mean)
t3<-apply(tau[,3,],2,mean)
t4<-apply(tau[,4,],2,mean)
t5<-apply(tau[,5,],2,mean)
t6<-apply(tau[,6,],2,mean)
thresholds_pgm<-c(t1,t2,t3,t4,t5,t6)
# Collect threshold parameters from eRm
thresholds_eRm<-thresholds(out)$threshpar
# Plot PGM thresholds against eRm thresholds
plot(thresholds_eRm, ylim=c(-2,2.5), col=c("red"), ylab="tau")
points(thresholds_pgm, ylim=c(-2,2.5), col=c("green"))
 
# Collect posterior means of
# ability parameters from jags
eta_pgm<-apply(eta,2,mean)
# Collect ability parameters
# from eRm
eta_eRm<-person.parameter(out)$theta.table[,1]
plot(apply(rsmdat,1,mean),eta_eRm, xlim=c(0,3), col=c("red"), xlab="mean score",ylab="eta")
points(apply(rsmdat,1,mean),eta_pgm, xlim=c(0,3), col=c("green"))

I will spare you the outputs of JAGS and eRm but encourage you to download the scripts and to have a look yourself. Lets jump directly to the plots.


Fig. 1.: Threshold parameters estimated by eRm (red) and JAGS (green)

Visual inspection shows that eRm and JAGS essentially agree with regard to the locations of the threshold parameters. But there seems to be a difference between JAGS and eRm that is more pronounced for higher threshold parameters. The reason for this is probably the fact that JAGS samples from the posterior distributions of the parameters while eRm is using the CML-method for threshold estimation.

Let’s have a look at the ability estimates.

Fig. 2.: Ability estimates obtained by eRm (red) and JAGS (green) vs. mean score

In Figure 2, the differences between eRm and JAGS seem to be more pronounced. While the relationsship between the mean scores and the ability estimates seems to be more or less linear for the estimates produced by JAGS, this is not the case for eRm. The reason for this is probably the fact that the ability estimates generated by eRm were produced by using the ML-method with CML-estimates for the thresholds plugged in. Warm’s weighted ML method of person parameter estimation will most likely be included into future versions of eRm.

Which estimation method is better, Bayesian estimation with JAGS or CML combined with ML as implemented in eRm? It is hard to say from this comparison, a more systematic and thorough simulation study would be due that can be easily constructed from the code above. But note that MCMC-methods can be slow, especially for larger datasets.

The point of this post was mainly to demonstrate, how psychometric models could be fit to data with MCMC and to cross-check the results against eRm.

The scripts can be obtained here.

Multinomial Logit Regression as PGM

I decided to switch to English as I believe that this topic may be interesting for a wider audience. In the last post I have given a simple example of how the t-test for independent samples can be understood from a generalized linear modeling (GLM) perspective, how the model translates into a probabilistic graphcal model (PGM) and how to use Bayesian estimation techniques for parameter estimation. The main motivation behind that post was to show, how generalized linear models, probabilistic graphical models and bayesian estimation techniques can work together for solving statistical inference tasks, particularly in but not limited to Psychology.

I can hear the one or the other say: “Pah, t-test, peanuts!”. So in this post I would like to turn to something a bit more advanced: Multinomial Logit Regression (MLR). One task of MLR is to model the probability of a categorical response c out of a set of C mutually exclusive possible responses based on a set of predictors that may characterize for example the repondents. Typical questions are: Which characteristics of a person predict the choice of a specific party (Political Science) or a specific product (Economics) or a specific category on a multiple choice questionaire (Psychology). This type of model is also quite prominent in the machine learning community. One task is to classify text documents into C mutually exclusive categories based on a set of text features as predictors.

Before we can specify the model as a PGM, we should have a look at its formal structure. First we need a probability distribution that is capable of modeling the probabilty of a certain categorical response c based on a set of parametetrs. One such distribution is:

(1)    \begin{equation*} p(y_i=c)=\frac{\mbox{exp}\left\{ \lambda_{ci}\right\}}{\sum_{c=1}^C\mbox{exp}\left\{ \lambda_{ci}\right\}}. \end{equation*}

This is quite neat. The denominator of that distribution is sometimes called the partition function or in German Zustandssumme from its use in statistical physics. It ensures that the probabilites for the possible events, states or responses sum up to one. Please note that the distribiution above has a little identification problem that can be solved by either setting \lambda_{1i} to zero or by imposing a sum constraint over the parameters.

Here we set \lambda_{0i} to zero:

(2)    \begin{equation*} \lambda_{1i}&=&0 .\\ \end{equation*}

This also means that we are using category 1 as a reference category for inference. More on that later.

The trick in MLR is to simply decompose the parameters of the base distribution by simple linear regression equations:

(3)    \begin{eqnarray*} \lambda_{ci}&=&\beta_{c0}+\beta_{c1} \cdot x_{1i} + \ldots + \beta_{ck}\cdot x_{ki}. \\ \end{eqnarray*}

That’s basically it. The \beta-weights have to be estimated from data, either by maximum likelihood or MCMC. To clarify the interpretation of the parameters and to elaborate a bit more, let’s assume we have three possible categories C=3 and only one predictor x_{1i}.

The probabilities of choosing a certain category c are:

(4)    \begin{eqnarray*} p(y_i=1)&=&\frac{1}{Z}\\ p(y_i=2)&=&\frac{\mbox{exp}\left\{\beta_{20}+\beta_{21} \cdot x_{1i} \right\}}{Z}\\ p(y_i=3)&=&\frac{\mbox{exp}\left\{\beta_{30}+\beta_{31} \cdot x_{1i}\right\}}{Z}, \end{eqnarray*}

with partiton function

 Z=1+\mbox{exp}\left\{\beta_{20}+\beta_{21} \cdot x_{1i}\right\}+\mbox{exp}\left\{\beta_{30}+\beta_{31} \cdot x_{1i}\right\}.

Please note that Z runs over all three possible states of the system.

If we write down the log-probability-ratio (or log-risk-ratio) of choosing category 2 over category 1 we get:

(5)    \begin{eqnarray*} \mbox{log}\left\{ \frac{p(y_i=2)}{p(y_i=1)}\right\}=\beta_{20}+\beta_{21} \cdot x_{1i} . \end{eqnarray*}

From this equation the meanings of the parameters become clear. \beta_{20} is nothing but the log-probability-ratio of choosing category 2 over category 1, if x_{1i} is zero. \beta_{21} is the expected change in terms of log-probabilty-ratios per unit increase of x_{1i}.

The log-probability-ratio of choosing category 3 over category 1 is:

(6)    \begin{equation*} \mbox{log}\left\{ \frac{p(y_i=3)}{p(y_i=1)}\right\}= \beta_{30}+\beta_{31} \cdot x_{1i}. \end{equation*}

Here we are predicting log-probability-ratios of choosing category 3 over category 1 based on the predictor x_{1i}. \beta_{30} is the log-probability-ratio of choosing category 3 over category 1, if x_{1i} is zero and \beta_{31} is the expected change in terms of log-probability-ratios per unit increase of x_{1i}. Quite easy and very similar to multiple regression. The only thing one has to keep in mind is that we are predicting C-1 log-probability-ratios instead of changes in one continuous variable.

We are now in the position of having a graphical look at this simple model.

Fig. 1: Graphical representation of the simple multinomial logit regression model in the example

The variable y[i] is categorically distributed with parameters l1[i], l2[i] and l3[i]. l1[i] is set to zero for indentification purposes and l2[i] and l3[i] are linearily decomposed, e.g. due to the underlying design, if we are working experimentally. x0[i] is a dummy variable for the constants (\beta_{20} and \beta_{30}) of the regression equations and x1[i] is the predictor. The arrows symbolize dependencies between the nodes of the network.

To show how the model’s parameters of are estimated via MCMC in a Bayesian fashion, let’s introduce a little example. Let’s assume we have sampled n=41 individuals that are falling into two groups (x_i \in \left\{0,1 \right\}). Each individual had the choice between 3 alternatives (y_i \in \left\{1,2,3 \right\}). We are interested in the question: Do the groups differ with respect to the chosen alternatives? Or: Does the chosen category depend on group membership? If we crosstabulate the data in R we find:

1
2
3
4
5
6
> table(y,x)
   x
y    0  1
  1  5  3
  2 13  4
  3  2 14

By visual inspection we see that group 0 seems to have preference for alternative 2 and group 1 seems to have a preference for alternative 3. To model these data and to estimate the parameters via MCMC we have to translate the model into OpenBUGS code. This is straight forward from the graphical representation or the equations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#uo_multinom_reg.txt
model
{
  # Likelihood
  for(i in 1:n)
  {
 
    lambda[1,i]<-0
    lambda[2,i]<-beta_20+beta_21*x[i]
    lambda[3,i]<-beta_30+beta_31*x[i]
 
    # Partition function
    Z[i]<-exp(lambda[1,i])+exp(lambda[2,i])+exp(lambda[3,i])
 
    for(c in 1:3)
    {
      p[i,c]<-exp(lambda[c,i])/Z[i]
    }
    y[i]~dcat(p[i,1:3])
  }
 
  # Priors
  beta_20~dnorm(0,1.0E-6)
  beta_21~dnorm(0,1.0E-6)
  beta_30~dnorm(0,1.0E-6)
  beta_31~dnorm(0,1.0E-6)
}

The choice of normal priors for the parameters is motivated by the theoretical result that maximum likelihood etimates are asymptotically normally distributed. In addition, the precision of the priors is set to a very low value. So these distributions encode the prior information that we really do not know anything about the distributions of the parameters in the population yet and want to let the data speak for themselves as much as possible.

The R package R2OpenBUGS is used to control the OpenBUGS script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# uo_multinom_reg.R
library(R2OpenBUGS)
data<-read.table("uo_multinom_reg.csv", sep=",", head=TRUE)
 
# data preparation
y<-data$wahl
x<-data$geschl
n<-length(y)
 
data<-list("y","x", "n")
 
parameters<-c("beta_20", "beta_30", "beta_21", "beta_31")
 
# Initial values for the markov chains
inits <- function(){
list(beta_20 = rnorm(1), beta_21=rnorm(1), beta_30=rnorm(1), beta_31=rnorm(1))
}
 
# Handing over to OpenBUGS
output<-bugs(data, inits=inits, parameters, model.file="uo_multinom_reg.txt",
    n.chains=2, n.iter=8000, n.burnin=1000)
 
# Printing the results
print(output, digits=2)

The output of OpenBUGS is

Inference for Bugs model at "uo_multinom_reg.txt", 
Current: 2 chains, each with 11000 iterations (first 1000 discarded)
Cumulative: n.sims = 20000 iterations saved
          mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff
beta_20   1.01 0.55 -0.02  0.64  0.98  1.36  2.13    1  2200
beta_30  -1.09 0.94 -3.22 -1.67 -1.01 -0.44  0.49    1  2600
beta_21  -0.68 1.01 -2.66 -1.35 -0.69 -0.01  1.31    1  1400
beta_31   2.78 1.16  0.76  1.98  2.68  3.48  5.35    1  2100
deviance 74.84 3.01 71.11 72.61 74.17 76.25 82.69    1 20000

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 4.1 and DIC = 79.0
DIC is an estimate of expected predictive error (lower deviance is better).

The statistics \hat{R} indicate that the markov process seems to have reached a stationary distribution.

The parameter beta_20 indicates a preference of category 2 over category 1 for group 0. But this difference is statistically not significant, probably due to small sample size. beta_30 indicates a preference of category 1 over category 3 for group 0, but from the percentiles of the posterior distributions we can see that this difference is also not significant statistically.

beta_21 is the contrast of group 1 to beta_20. As we can see, group 1 seems to have a slightly lower preference for category 2 over category 1 than group 0 in terms of log-probability-ratios, but this contrast is also not significant statistically.

beta_31 is statistically significant. Group 1 has a much stronger preference for category 3 over category 1 than group 0 in terms of log-probabilty-ratios.

Please note that this setup can be extended to include a larger set of continuous and categorical predictors. Also a multilevel extension of the model seems to be straight forward by “simply” including random effects into the linear regression equations.

The main objective of this post was to show in a simple way, how categorical data analysis techniques translate into a graphical representation that in terms can be used to estimate model parameters with MCMC-methods.

Grab the code here.

Der t-Test fĂŒr unabhĂ€ngige Stichproben als GLM und PGM

The proof of the pudding is in the eating. Von daher möchte ich an einem kleinen Beispiel zeigen, wie Generalisierte Lineare Modelle (GLMs), Probabilistische Graphische Modelle (PGMs) und Bayesianische Methoden praktisch zusammenspielen können. Was bietet sich besseres an, als der t-Test fĂŒr unabhĂ€ngige Stichproben? Um eine gewisse Vergleichbarkeit zu gewĂ€hrleisten, beziehe ich mich auf das Beispiel zum t-Test in Bortz, Statistik, 6. Auflage (Tabelle 5.1., Seite 142). Zur praktischen Demonstration verwende ich die Software R und OpenBUGS. Die verwendeten Dateien finden sich hier.

Der t-Test fĂŒr unabhĂ€ngige Stichproben als GLM

Den meisten dĂŒrfte bekannt sein, dass die Fragestellung hinter dem t-Test sich auch mit der multiplen Regression, bzw. dem Allgemeinen Linearen Modell (ALM) bearbeiten lĂ€sst. Im Beispieldatensatz aus dem Bortz geht es darum zu prĂŒfen, ob sich MĂ€nner und Frauen hinsichlich ihrer Belastbarkeit unterscheiden. Mit R geht das ganz einfach mit der Funktion glm().

1
2
3
4
5
6
7
8
# t-test_ua.R
 
# Laden der Daten
data<-read.table("B_5_1.csv", sep=",", head=T)
 
# Anpassung eines GLM mit IdentitÀts-Link-Funktion
m1<-glm(belast~geschl, data=data, family="gaussian")
summary(m1)

Es wird ein GLM mit IdentitÀts-Link-Funktion an die Daten angepasst. R liefert folgenden Output:

Call:
glm(formula = belast ~ geschl, family = "gaussian", data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-18.242   -9.732   -2.221    9.289   25.800  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  103.200      2.130  48.452   <2e-16 ***
geschlw        1.042      3.057   0.341    0.734    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for gaussian family taken to be 158.7827)

    Null deviance: 10498  on 67  degrees of freedom
Residual deviance: 10480  on 66  degrees of freedom
AIC: 541.54

Number of Fisher Scoring iterations: 2

Der mit Intercept bezeichnete SchÀtzer ist der vorhergesagte Mittelwert der Belastbarkeit der MÀnner und der SchÀtzer geschlw ist der Kontrast der Frauen zu dieser Baseline. Das bedeutet, dass Frauen einen um 1.042 Punkte höheren vorhergesagten Wert auf der Belastbarkeits-Skala aufweisen. Allerdings ist anhand des t-Werts von 0.341 deutlich ersichtlich, dass dieser Unterschied statistisch nicht signifikant ist. Dieses Ergebnis deckt sich im Wesentlichen mit dem Ergebnis im Bortz, wo ein t-Wert von -0.33 verzeichnet ist.

Der t-Test fĂŒr unabhĂ€ngige Stichproben als PGM

Das eben verwendete GLM lÀsst sich auch als grafisches Modell darstellen. Abbildung 1 zeigt die grafische ReprÀsentation.

t-Test fĂŒr unabhĂ€ngige Stichproben als PGM
Abbildung 1: Das GLM fĂŒr den t-Test fĂŒr unabhĂ€ngige Stichproben als PGM

y[i] bezeichnet die Werte der abhĂ€ngigen Variable und x1[i] und x2[i] bezeichnen die Werte der jeweiligen PrĂ€diktoren in der Design-Matrix. mu[i] ist der lineare PrĂ€diktor des Modells, der nach Maßgabe des zugrundeliegenden Designs linear zerlegt wird. sigma.y ist die Streuung der Residuen in der Population. Die Pfeile bezeichnen AbĂ€ngigkeiten zwischen den Knoten des Netzwerks. Der Rahmen (plate notation) um die Knoten symbolisiert, dass  n Beobachtungen vorliegen. Formal lĂ€sst sich das Modell relativ einfach darstellen:

 y_i \sim N(\mu_i, \sigma_y^2)  \mu_i=\beta_0 x_{1i} + \beta_1 x_{2i}

Es wird also angenommen, dass die beobachteten Werte  y_i einer Normalverteilung mit der Mittelwertsstruktur  \mu_i und der Varianz  \sigma_y^2 entstammen. Oder anders gesagt, es wird angenommen, dass die Residuen in der Population mit der Varianz  \sigma_y^2 normal verteilt sind. Wenn die Dummy-Codierung fĂŒr  x_{1i} der Spezifikation einer Konstanten entspricht, handelt es sich also um nichts anderes, als um eine einfache lineare Regression in grafischer Darstellung.

Bayesianische Bestimmung der Parameter

Die Parameter  \beta_0 ,  \beta_1 und die Varianz  \sigma_y^2 mĂŒssen aus den Daten geschĂ€tzt werden. Wie dies auf Bayesianische Art mit Hilfe der MCMC-Methode, R und OpenBUGS geschehen kann, sei kurz gezeigt. ZunĂ€chst muss eine Datei mit der Modellspezifikation fĂŒr OpenBUGS erstellt und gespeichert werden.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# t-test_ua.txt
model
{
  # Likelihood
  for (i in 1:n)
  {
  y[i] ~ dnorm (mu[i], tau.y)
  mu[i]<-beta0*X[i,1]+beta1*X[i,2]
  }
  # Prior-Distributions
  beta0 ~ dnorm(0, 1.0E-6)
  beta1 ~ dnorm(0, 1.0E-6)
  tau.y <- pow(sigma.y, -2)
  sigma.y ~ dunif (0, 1000)
  # Cohen's delta
  delta<-(beta1)/sigma.y
}

Die Spezifikation des Modells erfolgt im Abschnitt #Likelihood. Anschließend werden die Prior-Verteilungen spezifiziert, die hier relativ uninformativ gehalten sind. Als besonderer Bonus wird die Posterior-Verteilung von Cohen's delta auf Basis des MCMC-Prozesses erzeugt.

OpenBUGS wird durch das R-Skript angesteuert, das auch die Daten an OpenBUGS ĂŒbergibt.

9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
library(R2OpenBUGS)
# Vorbereitung der Daten zur 
# Übergabe an OpenBUGS
y<-data$belast
X<-model.matrix(m1)
n<-length(y)
data<-list("y","X", "n")
 
# Spezifikation der zu bestimmenden Parameter
parameters<-c("beta0", "beta1", "sigma.y", "delta")
 
# Übergabe an OpenBUGS
output<-bugs(data, inits=NULL, parameters, model.file="t-test_ua.txt",
    n.chains=2, n.iter=5000, n.burnin=1000)
 
# Ausgabe der Ergebnisse
print(output, digits=2)
 
# Histogramm von beta1
hist(beta1,100)
# Histogramm von Cohen's delta
hist(delta,100)

Um die Konvergenz des Markov-Prozesses zu ĂŒberprĂŒfen, kommen zwei Markov-Ketten zum Einsatz. Der Prozess durchlĂ€uft 5000 Iterationen, wobei die Burn-In-Phase 1000 Iterationen betrĂ€gt. Die Startwerte des Markov-Prozesses werden im Beispiel zufĂ€llig erzeugt.

OpenBUGS liefert folgenden Output:

Inference for Bugs model at "t-test_ua.txt", 
Current: 2 chains, each with 5000 iterations (first 1000 discarded)
Cumulative: n.sims = 8000 iterations saved
              mean   sd   2.5%    25%    50%    75%  97.5% Rhat n.eff
beta0       103.25 2.14  99.06 101.80 103.20 104.70 107.40    1  8000
beta1         0.99 3.07  -5.06  -1.02   0.97   2.98   7.06    1  8000
sigma.y      12.84 1.15  10.82  12.03  12.75  13.55  15.35    1  8000
delta         0.08 0.24  -0.39  -0.08   0.08   0.23   0.55    1  8000
deviance    538.62 2.55 535.80 536.80 538.00 539.70 545.00    1  4300

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 2.9 and DIC = 541.5
DIC is an estimate of expected predictive error (lower deviance is better).

Von Interesse sind hier besonders die Mittelwerte und Perzentile der Posterior-Verteilungen und Rhat. Der SchĂ€tzer fĂŒr die mittlere Belastbarkeit der MĂ€nner betrĂ€gt 103.25 und der geschĂ€tzte Kontrast der Frauen zum Mittelwert der MĂ€nner betrĂ€gt 0.99. Das 95%-KredibilitĂ€ts-Intervall fĂŒr \beta_1 liegt zwischen -5.06 und 7.06. Damit ist der Unterschied zwischen MĂ€nnern und Frauen statistisch nicht signifikant. Der SchĂ€tzer fĂŒr Cohen's delta betrĂ€gt 0.08. Dieses Ergebnis ist mit dem Resultat im Bortz kongruent. Zudem wurde das 95%-KredibilitĂ€ts-Intervall fĂŒr Cohen's delta ermittelt. Dieses Intervall liegt zwischen -0.39 und 0.55, was ebenfalls darauf hindeutet, dass der Unterschied zwischen MĂ€nnern und Frauen hinsichtlich der Belastbarkeit statistisch nicht signifikant ist. Die Statistik Rhat zeigt an, dass die zwei eingesetzten Markov-Ketten gegen die Posterior-Verteilung der Parameter konvergieren.

Was auffÀllt ist, dass in der Ausgabe keine p-Werte vorkommen. Das ist nicht weiter tragisch, da die Inferenz auf Basis der KredibilitÀts-Intervalle geschieht. Wer aber dennoch nicht auf p-Werte verzichten möchte, kann diese einfach aus den Posterior-Verteilungen berechnen. Zum Abschluss sei hier noch das Histogramm der Posterior-Verteilung von Cohen's delta dargestellt.


Abbildung 2: Posterior-Verteilung von Cohen's delta

Das war zugegebener Maßen ziemlich viel Information auf einmal und Anspruch dieser Blog-Post kann es nicht sein, die HintergrĂŒnde der hier verwendeten Verfahren umfassend zu erklĂ€ren. Den interessierten Leser oder die interessierte Leserin verweise ich auf die Literatur. Mir war es jedoch wichtig, einmal an einem einfachen Beispiel zu zeigen, wie GLMs, PGMs und Bayesianische Methoden ineinandergreifen können.

Als Übung empfehle ich zu ĂŒberlegen, wie das grafische Modell und der OpenBUGS-Code im Falle der multiplen Regression aussehen mĂŒssten. Bonuspunkte gibt es fĂŒr die Erzeugung der Posterior-Verteilung fĂŒr den Determinations-Koeffizienten, bzw. \eta^2. Interessant wĂ€re auch ein systematischer Vergleich der hier dargestellten Methode zur Erzeugung von KredibilitĂ€ts-Intervallen fĂŒr EffektstĂ€rke-Indices mit der „frequentistischen" Methode zur Erzeugung von Konfidenz-Intervallen auf Basis von nichtzentralen Verteilungen.