library(tidyverse) # loads data manipulation and visualization packages
library(survival) # loads the lung cancer data as `lung`
= c("#6C8EBF", "#c0a34d", "#780000","#007878","#B5C6DF","#EADAAA","#AE6666") colors
Chapter 2 - Exercise solutions
Click on the arrow to see a solution.
Exercise 2.1
Let \(X_1,\ldots,X_n \vert \theta \overset{\mathrm{iid}}{\sim} \mathrm{Expon}(\theta)\) be iid exponentially distributed data. Show that the Gamma distribution is the conjugate prior for this model.
Exercise 2.2
The dataset lung
in the R package survival
contains data on 228 patients with advanced lung cancer. We will here analyze the survival time in days for the patients which is recorded by the variable time
. The variable status
is a binary variable with status = 1
if the survival time of the patient is censored (patient still alive at the end of the study) and status = 2
if the survival time was uncensored (patient dead before the end of the study).
- Consider first only the uncensored patients (
status = 2
). Assume that the survival time \(X\) of the patients are independent exponentially distributed with a common rate parameter \(\theta\) such that \(\mathbb{E}(X \vert \theta) = 1/\theta\). Assume the conjugate prior \(\theta \sim \mathrm{Gamma}(\alpha,\beta)\). A doctor tells you that the expected time until death (\(1/\theta\)) for this population is around \(200\) days. It can be shown that setting \(\alpha=3\) and \(\beta=300\) implies that the prior mean for \(\mathbb{E}(X \vert \theta) = 1/\theta\) is \(200\) days, so use that prior. Plot the prior and posterior densities for \(\theta\) over a suitable grid of \(\theta\)-values. - Now consider all patients, both censored and uncensored, using the same prior as in (a). Plot the prior and posterior densities for \(\theta\) over a suitable grid of \(\theta\)-values.
Hint: The posterior is no longer tractable due to contributions of the censored patients to the likelihood. For the censored patients we only know that they lived at least the number of days recorded in the dataset. The likelihood contribution \(p(x_c \vert \theta)\) for the \(c\)th censored patient with recorded time \(x_c\) is therefore \(p(X \geq x_c \vert \theta) = e^{-\theta x_c}\), which follows from the distribution function of the exponential distribution \(p(X \leq x \vert \theta) = 1 - e^{-\theta x}\). - Plot a histogram of
time
and overlay the pdf of the exponential model with the parameter \(\theta\) estimated with the posterior mode.
Exercise 2.3
This exercise continues the analysis of the lung cancer data in Exercise 2.2
Assume that the survival time \(X\) of the lung cancer patients in Exercise 2.2 are independent Weibull distributed \[ X_1,\ldots,X_n \vert \lambda, k \overset{\mathrm{iid}}{\sim} \mathrm{Weibull}(\lambda,k). \] The value of \(k\) determines how the failure rate changes with time:
- \(k=1\) gives a failure (death) rate that is constant over time and corresponds to the special case of a exponential distribution \(\mathrm{Expon}(\theta=1/\lambda)\) used in Exercise 2.2. Note that (following Wikipedia) the exponential distribution is parameterized with a rate (inverse scale) parameter \(\theta\), while the Weibull is parameterized with a scale parameter \(\lambda= 1/\theta\) 🤷
- \(k<1\) gives a decreasing failure rate over time
- \(k>1\) gives an increasing failure rate over time.
- Plot the posterior distribution of \(\lambda\) conditional on \(k=1\), \(k=3/2\) and \(k=2\). For all \(k\), use the prior \(\lambda \sim \mathrm{Gamma}(\alpha,\beta)\) with \(\alpha=3\) and \(\beta=1/50\) (which a similar prior for \(\theta=1/\lambda\) as in Exercise 2.2). Hint: the posterior distribution for \(k\neq 1\) is intractable, so use numerical evaluation of the posterior over a grid of \(\lambda\)-values.
- Plot the
time
variable as a histogram and overlay the fitted model for the three different \(k\)-values; use the posterior mode for \(\theta\) in each model when plotting the fitted model density. - Use
stan
to sample from the posterior distribution of \(\lambda\) for a given \(k=3/2\). This should replicate your results in (a). Read this part of the Stan User Guide on how to implement censoring in the model before starting. The example in the User Guide has the same censoring point for all patients, which is not the case in thelung
dataset. So you need to generalize that to a vector of censoring points, one for each patient.
Exercise 2.4
Let \(X_1,\ldots,X_n\) be an iid sample from a distribution with density function \[ p(x) \propto \theta^2 x \exp (-x\theta)\quad \text{ for } x>0 \text{ and } \theta>0. \] Find the conjugate prior for this distribution and derive the posterior distribution from an iid sample \(x_1,\ldots,x_n\).