{"id":409,"date":"2012-11-23T22:05:13","date_gmt":"2012-11-23T22:05:13","guid":{"rendered":"http:\/\/www.georg-hosoya.de\/wordpress\/?p=409"},"modified":"2012-12-17T15:11:20","modified_gmt":"2012-12-17T15:11:20","slug":"multinomial-logit-regression-as-pgm","status":"publish","type":"post","link":"https:\/\/www.georg-hosoya.de\/wordpress\/?p=409","title":{"rendered":"Multinomial Logit Regression as PGM"},"content":{"rendered":"<p>I decided to switch to English as I believe that this topic may be interesting for a wider audience. In the last post I have given a simple example of how the t-test for independent samples can be understood from a generalized linear modeling (GLM) perspective, how the model translates into a probabilistic graphcal model (PGM) and how to use Bayesian estimation techniques for parameter estimation. The main motivation behind that post was to show, how generalized linear models, probabilistic graphical models and bayesian estimation techniques can work together for solving statistical inference tasks, particularly in but not limited to Psychology.  <\/p>\n<p>I can hear the one or the other say: &#8220;Pah, t-test, peanuts!&#8221;. So in this post I would like to turn to something a bit more advanced: Multinomial Logit Regression (MLR). One task of MLR is to model the probability of a categorical response <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-6b81d1a91c68e1cf3a5248d42a4633b0_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#99;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"8\" style=\"vertical-align: 0px;\"\/> out of a set of <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-9b61ec27349875afb6f95c0043a59a60_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#67;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> mutually exclusive  possible responses based on a set of predictors that may characterize for example the repondents. Typical questions are: Which characteristics of a person predict the choice of a specific party (Political Science) or a specific product (Economics) or a specific category on a multiple choice questionaire (Psychology). This type of model is also quite prominent in the machine learning community. One task is to classify text documents into <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-9b61ec27349875afb6f95c0043a59a60_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#67;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"14\" style=\"vertical-align: 0px;\"\/> mutually exclusive categories based on a set of text features as predictors. <\/p>\n<p>Before we can specify the model as a PGM, we should have a look at its formal structure. First we need a probability distribution that is capable of modeling the probabilty of a certain categorical response <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-6b81d1a91c68e1cf3a5248d42a4633b0_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#99;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"8\" style=\"vertical-align: 0px;\"\/> based on a set of parametetrs. One such distribution is:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 46px;\"><span class=\"ql-right-eqno\"> (1) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-5256b518239af51079821effeb9d4146_l3.png\" height=\"46\" width=\"216\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#112;&#40;&#121;&#95;&#105;&#61;&#99;&#41;&#61;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#32;&#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#99;&#105;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#125;&#123;&#92;&#115;&#117;&#109;&#95;&#123;&#99;&#61;&#49;&#125;&#94;&#67;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#32;&#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#99;&#105;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>This is quite neat. The denominator of that distribution is sometimes called the partition function or in German <em>Zustandssumme<\/em> from its use in statistical physics. It ensures that the probabilites for the possible events, states or responses sum up to one. Please note that the distribiution above has a little identification problem that can be solved by either setting <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-19c73f97fcbf7ea2bc00f79517e4da58_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"22\" style=\"vertical-align: -4px;\"\/> to zero or by imposing a sum constraint over the parameters.<\/p>\n<p>Here we set <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-ceb6f4d7ada6fb65c083d8563389e670_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#48;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"22\" style=\"vertical-align: -3px;\"\/> to zero:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 16px;\"><span class=\"ql-right-eqno\"> (2) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-b323918b5e7ca418a34a943111ff0900_l3.png\" height=\"16\" width=\"59\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#49;&#105;&#125;&#38;&#61;&#38;&#48;&#32;&#46;&#92;&#92; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>This also means that we are using category 1 as a reference category for inference. More on that later.<\/p>\n<p>The trick in MLR is to simply decompose the parameters of the base distribution by simple linear regression equations:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 17px;\"><span class=\"ql-right-eqno\"> (3) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-ceb14ba39b0bd498f4b73f4a1fc808b2_l3.png\" height=\"17\" width=\"301\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; &#92;&#108;&#97;&#109;&#98;&#100;&#97;&#95;&#123;&#99;&#105;&#125;&#38;&#61;&#38;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#99;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#99;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#32;&#43;&#32;&#92;&#108;&#100;&#111;&#116;&#115;&#32;&#43;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#99;&#107;&#125;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#107;&#105;&#125;&#46;&#32;&#92;&#92; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>That&#8217;s basically it. The <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-e9817af8e6f3b8cef7858648d0aade1a_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"11\" style=\"vertical-align: -4px;\"\/>-weights have to be estimated from data, either by maximum likelihood or MCMC. To clarify the interpretation of the parameters and to elaborate a bit more, let&#8217;s assume we have three possible categories <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-e15feb85c16600f2e22cd6cb5be629f7_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#67;&#61;&#51;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"47\" style=\"vertical-align: 0px;\"\/> and only one predictor <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/>. <\/p>\n<p>The probabilities of choosing a certain category <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-6b81d1a91c68e1cf3a5248d42a4633b0_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#99;\" title=\"Rendered by QuickLaTeX.com\" height=\"8\" width=\"8\" style=\"vertical-align: 0px;\"\/> are:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 124px;\"><span class=\"ql-right-eqno\"> (4) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-1d7dd20cf99dcfdc153647756c61cbf3_l3.png\" height=\"124\" width=\"278\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; &#112;&#40;&#121;&#95;&#105;&#61;&#49;&#41;&#38;&#61;&#38;&#92;&#102;&#114;&#97;&#99;&#123;&#49;&#125;&#123;&#90;&#125;&#92;&#92; &#112;&#40;&#121;&#95;&#105;&#61;&#50;&#41;&#38;&#61;&#38;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#32;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#125;&#123;&#90;&#125;&#92;&#92; &#112;&#40;&#121;&#95;&#105;&#61;&#51;&#41;&#38;&#61;&#38;&#92;&#102;&#114;&#97;&#99;&#123;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#125;&#123;&#90;&#125;&#44; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>with partiton function<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-2c4bfe7889f7499c60720c1e75e29b7b_l3.png\" class=\"ql-img-inline-formula \" alt=\" &#90;&#61;&#49;&#43;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#43;&#92;&#109;&#98;&#111;&#120;&#123;&#101;&#120;&#112;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#46; \" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"403\" style=\"vertical-align: -5px;\"\/> <\/p>\n<p>Please note that <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-1e8bcbb35449268a3c3a16a95fe49e22_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#90;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"12\" style=\"vertical-align: 0px;\"\/> runs over all three possible states of the system.<\/p>\n<p>If we write down the log-probability-ratio (or log-risk-ratio) of choosing category 2 over category 1 we get:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> (5) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-08f3eac46b61dd61ebab05eacc22aaba_l3.png\" height=\"43\" width=\"259\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; &#92;&#109;&#98;&#111;&#120;&#123;&#108;&#111;&#103;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#40;&#121;&#95;&#105;&#61;&#50;&#41;&#125;&#123;&#112;&#40;&#121;&#95;&#105;&#61;&#49;&#41;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#61;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#32;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#110;&#97;&#114;&#114;&#97;&#121;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>From this equation the meanings of the parameters become clear. <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-179de99bf05549b4b56277a28d9535e9_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#48;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"24\" style=\"vertical-align: -4px;\"\/> is nothing but the log-probability-ratio of choosing category 2 over category 1, if <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/> is zero. <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-51bf078bb028592139bc636e39381c3b_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#49;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"23\" style=\"vertical-align: -4px;\"\/> is the expected change in terms of log-probabilty-ratios per unit increase of <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/>. <\/p>\n<p>The log-probability-ratio of choosing category 3 over category 1 is:<\/p>\n<p class=\"ql-center-displayed-equation\" style=\"line-height: 43px;\"><span class=\"ql-right-eqno\"> (6) <\/span><span class=\"ql-left-eqno\"> &nbsp; <\/span><img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-7a60cac5725355c32c44105fc8cb48a6_l3.png\" height=\"43\" width=\"259\" class=\"ql-img-displayed-equation \" alt=\" &#92;&#98;&#101;&#103;&#105;&#110;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; &#92;&#109;&#98;&#111;&#120;&#123;&#108;&#111;&#103;&#125;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#32;&#92;&#102;&#114;&#97;&#99;&#123;&#112;&#40;&#121;&#95;&#105;&#61;&#51;&#41;&#125;&#123;&#112;&#40;&#121;&#95;&#105;&#61;&#49;&#41;&#125;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;&#61;&#32;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#48;&#125;&#43;&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#49;&#125;&#32;&#92;&#99;&#100;&#111;&#116;&#32;&#120;&#95;&#123;&#49;&#105;&#125;&#46; &#92;&#101;&#110;&#100;&#123;&#101;&#113;&#117;&#97;&#116;&#105;&#111;&#110;&#42;&#125; \" title=\"Rendered by QuickLaTeX.com\"\/><\/p>\n<p>Here we are predicting log-probability-ratios of choosing category 3 over category 1 based on the predictor <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/>. <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-1205ad5714ad81b7b845a1f289c046a2_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#48;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"24\" style=\"vertical-align: -4px;\"\/> is the log-probability-ratio of choosing category 3 over category 1, if <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/> is zero and  <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-ec535d8c3c341e2e8183dadda69de955_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#49;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"23\" style=\"vertical-align: -4px;\"\/> is the expected change in terms of log-probability-ratios per unit increase of <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-8625d9d96ef3cf242c22dd12a36e60c3_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#123;&#49;&#105;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"12\" width=\"22\" style=\"vertical-align: -4px;\"\/>. Quite easy and very similar to multiple regression. The only thing one has to keep in mind is that we are predicting  <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-944d42efdb322d7034da3b343df70fc7_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#67;&#45;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"44\" style=\"vertical-align: -1px;\"\/> log-probability-ratios instead of changes in one continuous variable.<\/p>\n<p>We are now in the position of having a graphical look at this simple model.<br \/>\n<a href=\"http:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/uploads\/2012\/11\/uo_multinom.jpg\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/uploads\/2012\/11\/uo_multinom.jpg\" alt=\"\" title=\"uo_multinom\" width=\"495\" height=\"594\" class=\"alignnone size-full wp-image-481\" srcset=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/uploads\/2012\/11\/uo_multinom.jpg 495w, https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/uploads\/2012\/11\/uo_multinom-250x300.jpg 250w\" sizes=\"(max-width: 495px) 100vw, 495px\" \/><\/a><br \/>\n<em>Fig. 1: Graphical representation of the simple multinomial logit regression model in the example<\/em><\/p>\n<p>The variable y[i] is categorically distributed with parameters l1[i], l2[i] and l3[i]. l1[i] is set to zero for indentification purposes and l2[i] and l3[i] are linearily decomposed, e.g. due to the underlying design, if we are working experimentally. x0[i] is a dummy variable for the constants (<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-179de99bf05549b4b56277a28d9535e9_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#50;&#48;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"24\" style=\"vertical-align: -4px;\"\/> and <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-1205ad5714ad81b7b845a1f289c046a2_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#98;&#101;&#116;&#97;&#95;&#123;&#51;&#48;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"16\" width=\"24\" style=\"vertical-align: -4px;\"\/>) of the regression equations and x1[i] is the predictor. The arrows symbolize dependencies between the nodes of the network.<\/p>\n<p>To show how the model&#8217;s parameters of are estimated via MCMC in a Bayesian fashion, let&#8217;s introduce a little example. Let&#8217;s assume we have sampled <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-1a3ef8dea52390c2eb03ba549f4e510c_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#110;&#61;&#52;&#49;\" title=\"Rendered by QuickLaTeX.com\" height=\"13\" width=\"51\" style=\"vertical-align: -1px;\"\/> individuals that are falling into two groups (<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-67c4505a4928c47e10d849e0b7e65857_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#120;&#95;&#105;&#32;&#92;&#105;&#110;&#32;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#48;&#44;&#49;&#32;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"80\" style=\"vertical-align: -5px;\"\/>). Each individual had the choice between 3 alternatives (<img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-4007df68a152390ffa532f2aa49be926_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#121;&#95;&#105;&#32;&#92;&#105;&#110;&#32;&#92;&#108;&#101;&#102;&#116;&#92;&#123;&#49;&#44;&#50;&#44;&#51;&#32;&#92;&#114;&#105;&#103;&#104;&#116;&#92;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"18\" width=\"95\" style=\"vertical-align: -5px;\"\/>). We are interested in the question:  Do the groups differ with respect to the chosen alternatives? Or: Does the chosen category depend on group membership? If we crosstabulate the data in R we find:<\/p>\n<pre lang=\"rsplus\" line=\"1\">\r\n> table(y,x)\r\n   x\r\ny    0  1\r\n  1  5  3\r\n  2 13  4\r\n  3  2 14\r\n<\/pre>\n<p>By visual inspection we see that group 0 seems to have preference for alternative 2 and group 1 seems to have a preference for alternative 3. To model these data and to estimate the parameters via MCMC we have to translate the model into OpenBUGS code. This is straight forward from the graphical representation or the equations:<\/p>\n<pre lang=\"rsplus\" line=\"1\">\r\n#uo_multinom_reg.txt\r\nmodel\r\n{\r\n  # Likelihood\r\n  for(i in 1:n)\r\n  {\r\n    \r\n    lambda[1,i]<-0\r\n    lambda[2,i]<-beta_20+beta_21*x[i]\r\n    lambda[3,i]<-beta_30+beta_31*x[i]\r\n     \r\n    # Partition function\r\n    Z[i]<-exp(lambda[1,i])+exp(lambda[2,i])+exp(lambda[3,i])\r\n\r\n    for(c in 1:3)\r\n    {\r\n      p[i,c]<-exp(lambda[c,i])\/Z[i]\r\n    }\r\n    y[i]~dcat(p[i,1:3])\r\n  }\r\n  \r\n  # Priors\r\n  beta_20~dnorm(0,1.0E-6)\r\n  beta_21~dnorm(0,1.0E-6)\r\n  beta_30~dnorm(0,1.0E-6)\r\n  beta_31~dnorm(0,1.0E-6)\r\n}\r\n<\/pre>\n<p>The choice of normal priors for the parameters is motivated by the theoretical result that maximum likelihood etimates are asymptotically normally distributed. In addition, the precision of the priors is set to a very low value. So these distributions encode the prior information that we really do not know anything about the distributions of the parameters in the population yet and want to let the data speak for themselves as much as possible.<\/p>\n<p>The R package R2OpenBUGS is used to control the OpenBUGS script:<\/p>\n<pre lang=\"rsplus\" line=\"1\">\r\n# uo_multinom_reg.R\r\nlibrary(R2OpenBUGS)\r\ndata<-read.table(\"uo_multinom_reg.csv\", sep=\",\", head=TRUE)\r\n\r\n# data preparation\r\ny<-data$wahl\r\nx<-data$geschl\r\nn<-length(y)\r\n\r\ndata<-list(\"y\",\"x\", \"n\")\r\n\r\nparameters<-c(\"beta_20\", \"beta_30\", \"beta_21\", \"beta_31\")\r\n\r\n# Initial values for the markov chains\r\ninits <- function(){\r\nlist(beta_20 = rnorm(1), beta_21=rnorm(1), beta_30=rnorm(1), beta_31=rnorm(1))\r\n}\r\n\r\n# Handing over to OpenBUGS\r\noutput<-bugs(data, inits=inits, parameters, model.file=\"uo_multinom_reg.txt\",\r\n    n.chains=2, n.iter=8000, n.burnin=1000)\r\n\r\n# Printing the results\r\nprint(output, digits=2)\r\n<\/pre>\n<p>The output of OpenBUGS is<\/p>\n<pre>\r\nInference for Bugs model at \"uo_multinom_reg.txt\", \r\nCurrent: 2 chains, each with 11000 iterations (first 1000 discarded)\r\nCumulative: n.sims = 20000 iterations saved\r\n          mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff\r\nbeta_20   1.01 0.55 -0.02  0.64  0.98  1.36  2.13    1  2200\r\nbeta_30  -1.09 0.94 -3.22 -1.67 -1.01 -0.44  0.49    1  2600\r\nbeta_21  -0.68 1.01 -2.66 -1.35 -0.69 -0.01  1.31    1  1400\r\nbeta_31   2.78 1.16  0.76  1.98  2.68  3.48  5.35    1  2100\r\ndeviance 74.84 3.01 71.11 72.61 74.17 76.25 82.69    1 20000\r\n\r\nFor each parameter, n.eff is a crude measure of effective sample size,\r\nand Rhat is the potential scale reduction factor (at convergence, Rhat=1).\r\n\r\nDIC info (using the rule, pD = Dbar-Dhat)\r\npD = 4.1 and DIC = 79.0\r\nDIC is an estimate of expected predictive error (lower deviance is better).\r\n<\/pre>\n<p>The statistics <img decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.georg-hosoya.de\/wordpress\/wp-content\/ql-cache\/quicklatex.com-4110c8fdab17ed1b3ef017149639250f_l3.png\" class=\"ql-img-inline-formula \" alt=\"&#92;&#104;&#97;&#116;&#123;&#82;&#125;\" title=\"Rendered by QuickLaTeX.com\" height=\"17\" width=\"14\" style=\"vertical-align: 0px;\"\/> indicate that the markov process seems to have reached a stationary distribution. <\/p>\n<p>The parameter <code>beta_20<\/code> indicates a preference of category 2 over category 1 for group 0. But this difference is statistically not significant, probably due to small sample size. <code>beta_30<\/code> indicates a preference of category 1 over category 3 for group 0, but from the percentiles of the posterior distributions we can see that this difference is also not significant statistically.<\/p>\n<p><code>beta_21<\/code> is the contrast of group 1 to <code>beta_20<\/code>. As we can see, group 1 seems to have a slightly lower preference for category 2 over category 1 than group 0 in terms of log-probability-ratios, but this contrast is also not significant statistically. <\/p>\n<p><code>beta_31<\/code> is statistically significant. Group 1 has a much stronger preference for category 3 over category 1 than group 0 in terms of log-probabilty-ratios. <\/p>\n<p>Please note that this setup can be extended to include a larger set of continuous and categorical predictors. Also a multilevel extension of the model seems to be straight forward by \"simply\" including random effects into the linear regression equations. <\/p>\n<p>The main objective of this post was to show in a simple way, how categorical data analysis techniques translate into a graphical representation that in terms can be used to estimate model parameters with MCMC-methods.<\/p>\n<p>Grab the code <a href=\"http:\/\/www.georg-hosoya.de\/wordpress\/?attachment_id=704\" rel=\"attachment wp-att-704\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I decided to switch to English as I believe that this topic may be interesting for a wider audience. In the last post I have given a simple example of how the t-test for independent samples can be understood from &hellip; <a href=\"https:\/\/www.georg-hosoya.de\/wordpress\/?p=409\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/409"}],"collection":[{"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=409"}],"version-history":[{"count":182,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/409\/revisions"}],"predecessor-version":[{"id":706,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=\/wp\/v2\/posts\/409\/revisions\/706"}],"wp:attachment":[{"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=409"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=409"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.georg-hosoya.de\/wordpress\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=409"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}