<- function(nIter, param){
SimDirichlet <- length(param)
nCat <- as.data.frame(matrix(NA, nIter, nCat)) # Storage.
thetaDraws for (j in 1:nCat){
<- rgamma(nIter,param[j],1)
thetaDraws[,j]
} for (i in 1:nIter){
= thetaDraws[i,]/sum(thetaDraws[i,])
thetaDraws[i,]
} return(thetaDraws)
}
Bayesian analysis of multinomial data
Prepared for the course: Bayesian Learning Author: Mattias Villani, Stockholm and Linköping University, http://mattiasvillani.com
Data
A company has conduct a survey of mobile phone usage. 513 participants were asked the question: ‘What kind of mobile phone do you main use?’ with the four options:
- iPhone
- Android Phone
- Windows Phone
- Other
The responses in the four categories were: 180, 230, 62, 41.
Model
Prior
The conjugate prior for multinomial data is the Dirichlet prior.
note that
Posterior
Ok, let’s use this piece of code on the survey data to obtain a sample from the posterior.
# Data and prior
set.seed(123) # Set the seed for reproducibility
<- c(180,230,62,41) # The cell phone survey data (K=4)
y <- c(15,15,10,10) # Dirichlet prior hyperparameters
alpha <- 1000 # Number of posterior draws
nIter <- SimDirichlet(nIter,y + alpha)
thetaDraws names(thetaDraws) <- c('theta1','theta2','theta3','theta4')
head(thetaDraws)
theta1 theta2 theta3 theta4
1 0.3530702 0.4539865 0.1079301 0.08501313
2 0.3676441 0.4020532 0.1406601 0.08964270
3 0.3347276 0.4782798 0.1030357 0.08395696
4 0.3395357 0.4487847 0.1217003 0.08997931
5 0.3769253 0.4175829 0.1180137 0.08747814
6 0.3607870 0.4186397 0.1275708 0.09300241
So thetaDraws is a nIter-by-4 matrix, where the
par(mfrow = c(1,2)) # Splits the graphical window in four parts
hist(thetaDraws[,1], breaks = 25, xlab = 'Fraction IPhone users', main ='iPhone', freq = FALSE)
lines(density(thetaDraws[,1]), col = "blue", lwd = 2)
hist(thetaDraws[,2], breaks = 25, xlab = 'Fraction Android users', main ='Android', freq = FALSE)
lines(density(thetaDraws[,2]), col = "blue", lwd = 2)
We can also compute the probability that Android has the largest market share by simply the proportion of posterior draws where Android is largest. You can for example see that this was the case in the first six draws shown above. This code does this calculation.
# Computing the posterior probability that Android is the largest
<- sum(thetaDraws[,2]>apply(thetaDraws[,c(1,3,4)],1,max))/nIter
PrAndroidLargest message(paste('Pr(Android has the largest market share) = ', PrAndroidLargest))
Pr(Android has the largest market share) = 0.991