Chapter 7 - Exercise solutions

Author

Mattias Villani

Click on the arrow to see a solution.

Exercise 7.2

This exercise continues the analysis of the lung cancer data in Exercise 2.3

Following Exercise 2.3, assume that the survival time \(X\) of the lung cancer patients are independent Weibull distributed \[ X_1,\ldots,X_n \vert \lambda, k \overset{\mathrm{iid}}{\sim} \mathrm{Weibull}(\lambda,k). \] In Exercise 2.3 the value of \(k\) was fixed; here we will treat both \(\lambda\) and \(k\) as unknown. Since both these parameters are positive we reparameterize first by taking logs: \[\tilde\lambda := \log\lambda \text{ and } \tilde k := \log k,\] which usually improves the normal approximation.
We assume prior independence between the two parameters and priors: \[\lambda \sim \mathrm{LogNormal}(5,1^2) \text{ and } k \sim \mathrm{LogNormal}(0,2^2), \] which by the definition of the log-normal distribution implies that \[\tilde\lambda \sim N(5,1^2) \text{ and } \tilde k \sim N(0,2^2).\]

  1. Compute a bivariate normal approximation of the joint posterior distribution in the log-parameterization \(p(\tilde\lambda, \tilde k \vert x_1,\ldots,x_n)\) using numerical optimization.
  2. Use the results from (a) to obtain a Log-normal approximation for the two marginal posterior distributions \(p(\lambda \vert x_1,\ldots,x_n)\) and \(p(k \vert x_1,\ldots,x_n)\) in the original parameterization. Plot the prior and posterior density for both parameters.
    hint: remember that the marginal distributions from a bivariat normal distribution are both normal. Also, recall the relationship between normal and log-normal distributions.
  3. If \(X \sim \mathrm{Weibull}(\lambda, k)\) then \[ \mu := \mathbb{E}(X)=\lambda\Gamma(1+1/k), \] where \(\Gamma()\) is the gamma function. Obtain an approximate posterior distribution for \(\mu\) using the results from (a) and simulation.
library(tidyverse) # loads data manipulation and visualization packages
library(survival) # loads the lung cancer data as `lung`
colors = c("#6C8EBF", "#c0a34d", "#780000","#007878","#B5C6DF","#EADAAA","#AE6666")

Set data and set up prior hyperparameters

x = lung$time
censored = (lung$status == 1)
log_lambda_mean <- 5
log_lambda_sd <- 1
log_k_mean <- 0
log_k_sd <- 2

Set up function that computes the log posterior for any \(\boldsymbol{\theta}=(\tilde\lambda,\tilde k)\) vector. The first argument of this function must be a vector containing all parameters.

# Function for computing the log posterior for any given parameter vector 
logpost_weibull <- function(theta, x, censored){
  
  # Compute the parameters in the original scale
  lambda = exp(theta[1])
  k = exp(theta[2])
  
   # Compute the (log) joint prior density
  logPrior = dnorm(theta[1], log_lambda_mean, log_lambda_sd, log = TRUE) +
             dnorm(theta[2], log_k_mean, log_k_sd, log = TRUE)
  
  # Compute the log-likelihood
  loglik_uncensored = sum(dweibull(x[-censored], shape = k, scale = lambda, 
                                   log = TRUE))
  loglik_censored = sum(pweibull(x[censored], shape = k, scale = lambda, 
                                 lower.tail = FALSE, log.p = TRUE))
  logLik = loglik_uncensored + loglik_censored
  
  # Return the log posterior
  return(logLik + logPrior) 
}

Use optim to find the posterior mode and the observed information matrix J

initVal <- c(log_lambda_mean, log_k_mean) # Start optimizer at prior means
OptimResults<-optim(initVal, logpost_weibull, gr=NULL, x, censored,
  method = c("BFGS"), control=list(fnscale=-1), hessian=TRUE)
postMode = OptimResults$par
postCov = -solve(OptimResults$hessian) # inv(J) - Approx posterior covar matrix
postStd <- sqrt(diag(postCov))         # Approximate stdev

The bivariate normal approximation for the transformed parameter vector \(\boldsymbol{\theta}=(\tilde\lambda,\tilde k)\) has mean vector

postMode
[1] 6.0171524 0.3593135

and covariance matrix

postCov
             [,1]         [,2]
[1,] 0.0021649252 0.0002793462
[2,] 0.0002793462 0.0026940478

from which we can compute approximate posterior standard deviations for \(\tilde\lambda\) and \(\tilde k\)

postStd
[1] 0.04652876 0.05190422

Since the marginal distributions from a bivariate normal distribution are both normal, we have that the following normal posterior approximations \[ \begin{align} \tilde\lambda \vert x_1,\ldots,x_n & \sim N(6.017,0.047) \\ \tilde k \vert x_1,\ldots,x_n & \sim N(0.359,0.052) \\ \end{align} \] and therefore Log-Normal approximations on the original scale \[ \begin{align} \lambda \vert x_1,\ldots,x_n & \sim \mathrm{LogNormal}(6.017,0.047) \\ k \vert x_1,\ldots,x_n & \sim \mathrm{LogNormal}(0.359,0.052) \\ \end{align} \]

# Plot the prior and posterior densities
lambdaGrid <- seq(1, 600, length = 1000)
kGrid <- seq(0.01, 3, length = 1000)

# Plot the prior and posterior densities
prior_dens_lambda <- dlnorm(lambdaGrid, log_lambda_mean, log_lambda_sd)
post_dens_lambda <- dlnorm(lambdaGrid, postMode[1], postStd[1])
prior_dens_k <- dlnorm(kGrid, log_k_mean, log_k_sd)
post_dens_k <- dlnorm(kGrid, postMode[2], postStd[2])

df_lambda <- data.frame(
  paramGrid = lambdaGrid, 
  prior = prior_dens_lambda, 
  posterior = post_dens_lambda
)

df_k <- data.frame(
  paramGrid = kGrid, 
  prior = prior_dens_k, 
  posterior = post_dens_k
)

df_long_lambda <- df_lambda %>% pivot_longer(-paramGrid, names_to = "density_type", values_to = "density")

p_lambda = ggplot(df_long_lambda) +
  aes(x = paramGrid, y = density, color = density_type) +
  geom_line() +
  scale_colour_manual(
    breaks = c("prior", "posterior"), 
    values = c(colors[2], colors[3])) +
  labs(title = "", x = expression(lambda), y = "Density", color = "") + 
  theme_minimal()

df_long_k <- df_k %>% pivot_longer(-paramGrid, names_to = "density_type", values_to = "density")

p_k = ggplot(df_long_k) +
  aes(x = paramGrid, y = density, color = density_type) +
  geom_line() +
  scale_colour_manual(
    breaks = c("prior", "posterior"), 
    values = c(colors[2], colors[3])) +
  labs(title = "" , x = expression(k), y = "Density", color = "") + 
  theme_minimal()

gridExtra::grid.arrange(p_lambda, p_k, nrow = 2)

We can simulate from the multivariate normal posterior approximation of \(\tilde\lambda,\tilde k\) from (a), transform the draws to the orginal parameterization \(\lambda,k\) and the finally compute the mean \(\mu = \lambda\Gamma(1 + 1/k)\) for each draw. Like this:

library(mvtnorm)
nDraws = 5000
thetaDraws = rmvnorm(nDraws, postMode, postCov)
lambdaDraws = exp(thetaDraws[,1])
kDraws = exp(thetaDraws[,2])
muDraws = lambdaDraws*gamma(1 + (1/kDraws))
hist(muDraws, 50, col = colors[1], freq = FALSE, xlab = expression(mu), 
     ylab = "density", main = "Posterior for the Weibull mean")

Exercise 7.3

This exercise uses the lung cancer data presented in Exercise 2.3

Here we model the survival times of the lung cancer patients in Exercise 2.3 as independent Weibull distributed with a scale parameter \(\lambda\) that is a function of covariates, i.e. using a Weibull regression model. The response variable time is denoted by \(y\) and is modelled as a function of the three covariates age, sex and ph.ecog (ECOG performance score). The model for patient \(i\) is: \[ y_i \vert \mathbf{x}_i, \boldsymbol{\beta}, k \overset{\mathrm{ind}}{\sim} \mathrm{Weibull}\big(\lambda_i = \exp(\mathbf{x}_i^\top \boldsymbol{\beta}),k\big). \] where \(\boldsymbol{\beta}\) is the vector with regression coefficients. Note that by the properties of the Weibull distribution, the conditional mean in this model is \(\mathbb{E}(y \vert \mathbf{x}_i) = \lambda_i\Gamma(1+1/k)\), so the regression coefficients do not quite have the usual interpretation of the effect on the conditional mean. The three covariates are placed in a \(n\times p\) matrix \(\mathbf{X}\) with the first column being one for all observations to model the intercept. Use a multivariate normal prior for \(\boldsymbol{\beta} \sim N(\mathbf{0},\tau^2\mathbf{I}_p)\) with the non-informative choice \(\tau = 100\). Reparameterize \(\tilde k := \log k\) and use the prior \(\tilde k \sim N(0,2^2)\). Remove the patients with missing values in the selected covariates.

  1. Compute a normal approximation of the joint posterior distribution \(p(\boldsymbol{\beta}, \tilde k \vert \mathbf{y}, \mathbf{X})\) using numerical optimization. Plot the marginal posteriors of each regression coefficient and the marginal posterior for \(k\) on the original scale.
  2. Use the result from (a) and Monte Carlo simulation to compute the predictive densities for the following two new patients:
    • 80-year female with an ECOG performance score of 0 (0=asymptomatic), i.e. \(\mathbf{x} = (1, 80, 1, 0)^\top\)
    • 80-year female with an ECOG performance score of 4 (4=bedbound) \(\mathbf{x} = (1, 80, 1, 4)^\top\).
    Plot the two predictive densities and compare. Compute the predictive probability of living for at least another 1000 days for both patients.
library(tidyverse) # loads data manipulation and visualization packages
library(mvtnorm)
colors = c("#6C8EBF", "#c0a34d", "#780000","#007878","#B5C6DF","#EADAAA","#AE6666")

Set data and set up prior hyperparameters

library(survival) # loads the lung cancer data as `lung`
lung <- lung %>% select(c("time", "status", "age", "sex", "ph.ecog")) %>% drop_na()
y = lung$time
X = cbind(1, lung$age, lung$sex == 2, lung$ph.ecog) # sex = 1 is female
p = dim(X)[2]
censored = (lung$status == 1)
mu <- rep(0,p)  # beta ~ N(mu, tau^2*I)
tau <- 100    
log_k_mean <- 0
log_k_sd <- 2

Set up a function that computes the log posterior for any \(\boldsymbol{\theta}=(\boldsymbol{\beta}^\top,\tilde k)^\top\) vector. The first argument of this function must be a vector containing all parameters.

# Function for computing the log posterior for any given parameter vector 
logpost_weibullreg <- function(theta, y, X, censored){
  
  p = dim(X)[2]
  
  # Compute the parameters in the original scale
  beta_ = theta[1:p]
  k = exp(theta[p+1])
  
   # Compute the (log) joint prior density
  logPrior = dmvnorm(beta_, mu, tau^2*diag(p), log = TRUE) +
             dnorm(theta[p+1], log_k_mean, log_k_sd, log = TRUE)
  
  # Compute the log-likelihood
  lambda_uncensored = exp(X[-censored,]%*%beta_)
  loglik_uncensored = sum(dweibull(y[-censored], shape = k, 
                                   scale = lambda_uncensored, log = TRUE))
  lambda_censored = exp(X[censored,]%*%beta_)
  loglik_censored = sum(pweibull(y[censored], shape = k, 
                                  scale = lambda_censored, 
                                  lower.tail = FALSE, log.p = TRUE))
  logLik = loglik_uncensored + loglik_censored
  
  # Return the log posterior
  return(logLik + logPrior) 
}

Use optim to find the posterior mode and the observed information matrix J:

initVal <- c(5, 0, 0, 0, log_k_mean) # Start optimizer at prior means
OptimResults<-optim(initVal, logpost_weibullreg, gr=NULL, y, X, censored,
  method = c("BFGS"), control=list(fnscale=-1), hessian=TRUE)
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
Warning in dweibull(y[-censored], shape = k, scale = lambda_uncensored, : NaNs
produced
postMode = OptimResults$par
postCov = -solve(OptimResults$hessian) # inv(J) - Approx posterior covar matrix

The multivariate normal approximation for \(\boldsymbol{\theta}=(\boldsymbol{\beta},\tilde k)\) has mean vector

postMode
[1]  6.29535724 -0.00240418  0.23063114 -0.25847948  0.37972847

and covariance matrix

postCov
              [,1]          [,2]          [,3]          [,4]          [,5]
[1,]  0.1019922630 -1.589499e-03 -3.809323e-03  8.236358e-04 -8.249516e-04
[2,] -0.0015894994  2.666312e-05  9.209196e-06 -8.162842e-05  1.683131e-05
[3,] -0.0038093228  9.209196e-06  8.756587e-03 -3.349311e-04 -5.095982e-04
[4,]  0.0008236358 -8.162842e-05 -3.349311e-04  4.673567e-03  2.946088e-04
[5,] -0.0008249516  1.683131e-05 -5.095982e-04  2.946088e-04  2.656437e-03

from which we can compute approximate posterior standard deviations for each parameter

postStd <- sqrt(diag(postCov))         # Approximate stdev
postStd
[1] 0.319362275 0.005163634 0.093576639 0.068363492 0.051540635

Since the marginal distributions from a multivariate normal distribution are all normal, the marginal posterior for \(\tilde k\) is normal and the posterior for \(k\) is therefore Log-Normal \[ \begin{equation} k \vert \mathbf{y},\mathbf{X} \sim \mathrm{LogNormal}(0.38,0.052) \\ \end{equation} \] This marginal posterior is plotted below

kGrid <- seq(0.01, 2, length = 1000)
prior_dens_k <- dlnorm(kGrid, log_k_mean, log_k_sd)
post_dens_k <- dlnorm(kGrid, postMode[p+1], postStd[p+1])
plot(kGrid, prior_dens_k, col = colors[1], type = "l", ylim = c(0,5.5), lwd = 2)
lines(kGrid, post_dens_k, col = colors[3], lwd  = 2)

The marginal posteriors for the four beta coefficients are plotted below.

varNames = c("intercept", "age", "sex", "ph.ecog")
par(mfrow = c(2,2))
for (j in 1:p){
  betaGrid = seq(postMode[j] - 4*postStd[j], postMode[j] + 4*postStd[j], 
                 length = 1000)
  plot(betaGrid, dnorm(betaGrid, postMode[j], postStd[j]), col = colors[1], 
       xlab = expression(beta), ylab = "density", main = varNames[j], 
       type = "l", lwd = 2)
}

The predictive distribution for the lifetime of a new person \(\tilde y\) with covariate vector \(\tilde{\mathbf{x}}\) given the training data \((\mathbf{y}, \mathbf{X})\) is given by the integral: \[ p(\tilde y \vert \tilde{\mathbf{x}}, \mathbf{y}, \mathbf{X}) = \int p(\tilde y \vert \tilde{\mathbf{x}}, \boldsymbol{\beta}, k) p(\boldsymbol{\beta}, k \vert \mathbf{y}, \mathbf{X})\mathrm{d}\boldsymbol{\beta} \mathrm{d} k \] where \(p(\boldsymbol{\beta}, k \vert \mathbf{y}, \mathbf{X})\) is the posterior distribution and \(p(\tilde y \vert \tilde{\mathbf{x}}, \boldsymbol{\beta}, k)\) is the Weibull density. A Monte Carlo evaluation of this integral is obtained by:

  • simulating parameter draws \(\boldsymbol{\beta}^{(i)}, \tilde{k}^{(i)}\) for \(i=1,\ldots,m\) from the multivariate normal approximation in (a)
  • compute \(k^{(i)} = \exp\big(\tilde{k}^{(i)}\big)\) for each draw
  • simulate a predictive draw \(\tilde y^{(i)}\) from the Weibull model given those parameters \[ \tilde y^{(i)} \vert \tilde{\mathbf{x}}, \boldsymbol{\beta}^{(i)}, k^{(i)} \overset{\mathrm{ind}}{\sim} \mathrm{Weibull}\big(\lambda_i = \exp(\tilde{\mathbf{x}}^\top \boldsymbol{\beta}^{(i)}), k^{(i)} \big). \] This code below does exactly this for the two patients. Note that this can be done much efficiently by making all draws in one shot, but this code is connects better to the algorithm.
nDraws = 5000
x1_tilde = c(1,80,1,0) # first patient
x2_tilde = c(1,80,1,5) # second patient
y1_tilde = rep(0,nDraws)
y2_tilde = rep(0,nDraws)
for (i in 1:nDraws){
  
  # Simulate from multivariate normal posterior approximation
  theta <- rmvnorm(1, postMode, postCov)
  beta_ = theta[1:p]
  k = exp(theta[p+1])
  
  # Simulate predictive draws from the model
  y1_tilde[i] = rweibull(1, shape = k, scale = exp(x1_tilde %*% beta_))
  y2_tilde[i] = rweibull(1, shape = k, scale = exp(x2_tilde %*% beta_))
}

# Plot the predictive densities as kernel density estimates
kde1 = density(y1_tilde)
kde2 = density(y2_tilde)
plot(kde1$x, kde1$y, col = colors[1], type = "l", , lwd = 2, ylim = c(0, 0.005),
     xlab = "days", ylab = "predictive density",
     main = "predictive distributions for two patients")
lines(kde2$x, kde2$y, col = colors[3], lwd = 2)
legend(x = "topright", inset=.05, legend = c("asymptomatic", "bedbound"), 
         lty = c(1,1), col = c(colors[1], colors[3]), box.lty=1)

The posterior predictive probability of living for at least another 1000 days is

message("The probability that a 80 year female asymptomatic patient lives for at least 1000 days is ", mean(y1_tilde >= 1000))  
The probability that a 80 year female asymptomatic patient lives for at least 1000 days is 0.106
message("The probability that a 80 year female bedbound patient lives for at least 1000 days is ", mean(y2_tilde >= 1000))  
The probability that a 80 year female bedbound patient lives for at least 1000 days is 6e-04