Chapter 2 - Exercise solutions

Author

Mattias Villani

Click on the arrow to see a solution.

Exercise 2.1

Let \(X_1,\ldots,X_n \vert \theta \overset{\mathrm{iid}}{\sim} \mathrm{Expon}(\theta)\) be iid exponentially distributed data. Show that the Gamma distribution is the conjugate prior for this model.

Solution Exercise 2.1

The likelihood from an iid sample from \(\mathrm{Expon}(\theta)\) is \[ p(x_1,\ldots,x_n \vert \theta)= \prod_{i=1}^n p(x_i \vert \theta) = \prod_{i=1}^n \theta e^{-\theta x_i} = \theta^n e^{-\theta\sum_{i=1}^n x_i} \] The density of the \(\theta \sim \mathrm{Gamma}(\alpha,\beta)\) prior is \[ p(\theta) = \frac{\beta^\alpha}{\Gamma(\alpha)}\theta^{\alpha-1}e^{-\beta\theta} \propto \theta^{\alpha-1}e^{-\beta\theta} \]

By Bayes’ theorem, the posterior distribution is \[ \begin{align} p(\theta \vert x_1,\ldots,x_n) &\propto p(x_1,\ldots,x_n \vert \theta)p(\theta) \\ & \propto \theta^n e^{-\theta\sum_{i=1}^n x_i}\theta^{\alpha-1}e^{-\beta\theta} \\ & = \theta^{\alpha + n - 1} e^{ -\theta(\beta + \sum_{i=1}^n x_i)}, \end{align} \] which can be recognized as proportional to the \(\theta \sim \mathrm{Gamma}(\alpha +n,\beta + \sum\nolimits_{i=1}^n x_i)\) distribution. Since the prior and posterior belongs to the same (Gamma) distributional family, the Gamma prior is indeed conjugate to the exponential likelihood.

Exercise 2.2

The dataset lung in the R package survival contains data on 228 patients with advanced lung cancer. We will here analyze the survival time in days for the patients which is recorded by the variable time. The variable status is a binary variable with status = 1 if the survival time of the patient is censored (patient still alive at the end of the study) and status = 2 if the survival time was uncensored (patient dead before the end of the study).

Consider first only the uncensored patients (status = 2). Assume that the survival time \(X\) of the patients are independent exponentially distributed with a common rate parameter \(\theta\) such that \(\mathbb{E}(X \vert \theta) = 1/\theta\). Assume the conjugate prior \(\theta \sim \mathrm{Gamma}(\alpha,\beta)\). A doctor tells you that the expected time until death (\(1/\theta\)) for this population is around \(200\) days. It can be shown that setting \(\alpha=3\) and \(\beta=300\) implies that the prior mean for \(\mathbb{E}(X \vert \theta) = 1/\theta\) is \(200\) days, so use that prior. Plot the prior and posterior densities for \(\theta\) over a suitable grid of \(\theta\)-values.
Now consider all patients, both censored and uncensored, using the same prior as in (a). Plot the prior and posterior densities for \(\theta\) over a suitable grid of \(\theta\)-values.
Hint: The posterior is no longer tractable due to contributions of the censored patients to the likelihood. For the censored patients we only know that they lived at least the number of days recorded in the dataset. The likelihood contribution \(p(x_c \vert \theta)\) for the \(c\)th censored patient with recorded time \(x_c\) is therefore \(p(X \geq x_c \vert \theta) = e^{-\theta x_c}\), which follows from the distribution function of the exponential distribution \(p(X \leq x \vert \theta) = 1 - e^{-\theta x}\).
Plot a histogram of time and overlay the pdf of the exponential model with the parameter \(\theta\) estimated with the posterior mode.

Solution Exercise 2.2a

From Exercise 2.1, we know that the posterior distribution is \[\theta \sim \mathrm{Gamma}(\alpha + n_u, \beta + \sum\nolimits_{u \in \mathcal{U}} x_u),\] where \(n_u\) is the number of uncensored observations and \(\mathcal{U}\) is the set of observation indices for the uncensored data.

The following code plots the prior, likelihood (normalized) and posterior over a grid of values for \(\theta\). Note that the data is so much stronger than the prior that the posterior is virtually identical to the likelihood, which is why the normalized likelihood is not visible in the plot.

library(tidyverse) # loads data manipulation and visualization packages
library(survival) # loads the lung cancer data as `lung`
colors = c("#6C8EBF", "#c0a34d", "#780000","#007878","#B5C6DF","#EADAAA","#AE6666")

# Summarize the data needed for the posterior, filter out censored data
data_summary <- lung %>% filter(status == 2) %>% summarize(n = n(), sum_x = sum(time))

# Set up prior hyperparameters
alpha_prior <- 3   # shape parameter
beta_prior <- 300  # rate parameter

# Compute posterior hyperparameters
alpha_post <- alpha_prior + data_summary$n  
beta_post <- beta_prior + data_summary$sum_x

# Plot the prior and posterior densities, and the (normalized) likelihood as a bon 
thetaGrid <- seq(0, 0.03, length.out = 1000)
prior_density <- dgamma(thetaGrid, shape = alpha_prior, rate = beta_prior)
likelihood_density <- dgamma(thetaGrid, shape = data_summary$n, rate = data_summary$sum_x)
posterior_density <- dgamma(thetaGrid, shape = alpha_post, rate = beta_post)

df <- data.frame(
  thetaGrid = thetaGrid, 
  prior = prior_density, 
  likelihood = likelihood_density,
  posterior = posterior_density
)

df_long <- df %>% pivot_longer(-thetaGrid, names_to = "density_type", values_to = "density")

# Plot using ggplot2
ggplot(df_long) +
  aes(x = thetaGrid, y = density, color = density_type) +
  geom_line() +
  scale_colour_manual(
    breaks = c("prior", "likelihood", "posterior"), 
    values = c(colors[2], colors[1], colors[3])) +
  labs(title = "Exercise 2.2", x = expression(theta), y = "Density", color = "") + 
  theme_minimal()

Solution Exercise 2.2b

The likelihood for all data, censored and uncensored, is \[ \begin{align} p(x_1,\ldots,x_n \vert \theta) & = \prod_{i=1}^n p(x_i \vert \theta) \\ & = \prod_{u \in \mathcal{U}} p(x_u \vert \theta) \prod_{c \in \mathcal{C}} p(x_c \vert \theta) \\ & = \prod_{u \in \mathcal{U}} p(x_u \vert \theta) \prod_{c \in \mathcal{C}} \left(1 - F(x_c \vert \theta)\right) \end{align} \] where \(\mathcal{U}\) and \(\mathcal{C}\) are the sets of observation indicies for the uncensored and censored data, respectively. The likelihood for the uncensored data (the first product) is the same as before \[ \prod_{u \in \mathcal{U}} p(x_u \vert \theta) = \prod_{u \in \mathcal{U}} \theta e^{-\theta x_u} = \theta^{n_u} e^{-\theta\sum_{u \in \mathcal{U}} x_u}, \] where \(n_u\) is the number of uncensored observations. The likelihood contribution for each observation in the censored set (the second product) is the survival function \[ \mathrm{Pr}(X \geq x_c) = 1 - F(x_c \vert \theta), \] where \(F(x_c \vert \theta) = 1 - e^{-x_c \theta}\) is the cumulative distribution function of the exponential distribution evaluated at \(x_c\).

So the likelihood function is \[ p(x_1,\ldots,x_n \vert \theta) = \theta^n e^{-\theta\sum_{u \in \mathcal{U}} x_u} \times e^{-\theta \sum_{u \in \mathcal{U}} x_c} = \theta^{n_u} e^{-\theta\sum_{i = 1}^n x_i}. \] where one should note that \(\theta\) is raised to the number of uncensored observations, \(n_u\) while the sum in the exponential term includes both uncensored and censored observations.

By Bayes’ theorem, the posterior distribution is again a Gamma distribution \[ \begin{align} p(\theta \vert x_1,\ldots,x_n) & \propto p(x_1,\ldots,x_n \vert \theta)p(\theta) \\ & \propto \theta^{n_u} e^{-\theta\sum_{i = 1}^n x_i} \times \theta^{\alpha-1}e^{-\beta\theta} \\ & = \theta^{\alpha + n_u - 1} e^{ -\theta(\beta + \sum_{i = 1}^n x_i)}, \end{align} \] which we recognize as proportional to the following Gamma distribution \[ \theta \vert x_1,\ldots,x_n \sim \mathrm{Gamma}(\alpha + n_u,\beta + \sum\nolimits_{i=1}^n x_i). \]

The code below plots both:

the posterior from the previous exercise (a) with only the uncensored data and
the posterior from with all data.

The posterior with all data is more informative and concentrates on smaller \(\theta\) values. Since smaller \(\theta\) values correspond to longer expected survival times, this is makes sense since the censored patients were still alive at the end of the study.

# Summarize the data needed for the posterior, grouped by `status`:
data_summary <- lung %>% group_by(status) %>% summarize(n = n(), sum_x = sum(time))

# Set up prior hyperparameters
alpha_prior <- 3   # shape parameter
beta_prior <- 300  # rate parameter

# Compute posterior hyperparameters - only uncensored data
alpha_post_u <- alpha_prior + data_summary$n[2] # second row is uncensored data (status = 2)  
beta_post_u <- beta_prior + data_summary$sum_x[2] # sum over uncensored observations

# Compute posterior hyperparameters - all data
alpha_post_all <- alpha_prior + data_summary$n[2] # note: this is still n_u 
beta_post_all <- beta_prior + sum(data_summary$sum_x) # sum over all observations

# Plot the prior and the two posterior densities 
thetaGrid <- seq(0, 0.03, length.out = 1000)
prior_density <- dgamma(thetaGrid, shape = alpha_prior, rate = beta_prior)
posterior_density_u <- dgamma(thetaGrid, shape = alpha_post_u, rate = beta_post_u)
posterior_density_all <- dgamma(thetaGrid, shape = alpha_post_all, rate = beta_post_all)


df <- data.frame(
  thetaGrid = thetaGrid, 
  prior = prior_density, 
  posterior_uncensored = posterior_density_u,
  posterior_all = posterior_density_all
)

df_long <- df %>% pivot_longer(-thetaGrid, names_to = "density_type", values_to = "density")

ggplot(df_long) +
  aes(x = thetaGrid, y = density, color = density_type) +
  geom_line() +
  scale_colour_manual(
    breaks = c("prior", "posterior_uncensored", "posterior_all"), 
    values = c(colors[2], colors[3], colors[4])) +
  labs(title = "Exercise 2.2", x = expression(theta), y = "Density", color = "") + 
  theme_minimal()

Solution Exercise 2.2 c

The code below plots the histogram and the pdf of the exponential model with the parameter \(\theta\) set equal to the posterior mode. It is clear that the exponential model with its monotonically decreasing density is not fitting the data well.

postMode = df$thetaGrid[which.max(df$posterior_all)]

ggplot(lung, aes(time)) +
  geom_histogram(aes(y = after_stat(density), fill = "Data"), bins = 30) +
  stat_function(fun = dexp, args = list(rate = postMode), lwd = 1, 
                aes(color = "Exponential fit"),
  ) +
  labs(title = "Exercise 2.2c - Exponential model fit to lung cancer survival", x = "days", y = "Density") + 
  scale_fill_manual("", values = colors[5]) +
  scale_color_manual("", values = colors[3]) +
  theme_minimal()

Exercise 2.3

This exercise continues the analysis of the lung cancer data in Exercise 2.2

Assume that the survival time \(X\) of the lung cancer patients in Exercise 2.2 are independent Weibull distributed \[ X_1,\ldots,X_n \vert \lambda, k \overset{\mathrm{iid}}{\sim} \mathrm{Weibull}(\lambda,k). \] The value of \(k\) determines how the failure rate changes with time:

\(k=1\) gives a failure (death) rate that is constant over time and corresponds to the special case of a exponential distribution \(\mathrm{Expon}(\theta=1/\lambda)\) used in Exercise 2.2. Note that (following Wikipedia) the exponential distribution is parameterized with a rate (inverse scale) parameter \(\theta\), while the Weibull is parameterized with a scale parameter \(\lambda= 1/\theta\) 🤷
\(k<1\) gives a decreasing failure rate over time
\(k>1\) gives an increasing failure rate over time.

Plot the posterior distribution of \(\lambda\) conditional on \(k=1\), \(k=3/2\) and \(k=2\). For all \(k\), use the prior \(\lambda \sim \mathrm{Gamma}(\alpha,\beta)\) with \(\alpha=3\) and \(\beta=1/50\) (which a similar prior for \(\theta=1/\lambda\) as in Exercise 2.2). Hint: the posterior distribution for \(k\neq 1\) is intractable, so use numerical evaluation of the posterior over a grid of \(\lambda\)-values.
Plot the time variable as a histogram and overlay the fitted model for the three different \(k\)-values; use the posterior mode for \(\theta\) in each model when plotting the fitted model density.
Use stan to sample from the posterior distribution of \(\lambda\) for a given \(k=3/2\). This should replicate your results in (a). Read this part of the Stan User Guide on how to implement censoring in the model before starting. The example in the User Guide has the same censoring point for all patients, which is not the case in the lung dataset. So you need to generalize that to a vector of censoring points, one for each patient.

Solution Exercise 2.3a

Similar to Exercise 2.2b, the likelihood can be computed with separate treatment of the uncensored and censored observations: \[ \begin{align} p(x_1,\ldots,x_n \vert \lambda, k) & = \prod_{i=1}^n p(x_i \vert \lambda, k) \\ & = \prod_{u \in \mathcal{U}} p(x_u \vert \lambda, k) \prod_{c \in \mathcal{C}} \Big(1 - F(x_c \vert \lambda, k)\Big) \end{align} \] where \(p(x \vert \lambda, k)\) is the pdf of a Weibull variable \[ p(x \vert \lambda, k) = \frac{k}{\lambda}\Big( \frac{x}{\lambda} \Big)^{k-1}e^{-(x/\lambda)^k}\quad\text{ for }x>0 \] which is implemented in R as dweibull. The cdf of the Weibull distribution is of rather simple form \[ F(x \vert \lambda, k) = 1 - e^{-(x/\lambda)^k} \] and is implemented in R as pweibull.

The code below plots the prior and posterior distribution for \(\lambda\) for the three different \(k\)-values. We could have inserted the mathematical expressions for the pdf and cdf and simplified the final likelihood expression; we will instead use the dweibull and pweibull functions without simplifications since it gives a more general template that can be used for any distribution, not just the Weibull model. For numerical stability we usually compute the posterior distribution on the log scale \[ \log p(\lambda^{(j)} \vert x_1,\ldots,x_n) \propto \log p(x_1,\ldots,x_n \vert \lambda_j) + \log p(\lambda_j) \] for a grid of equally spaced \(\lambda\)-values: \(\lambda^{(1)}\ldots,\lambda^{(J)}\). The \(\propto\) sign now means that there is a missing additive constant \(\log p(x_1,\ldots,x_n)\) which does not depend on the unknown parameter \(\lambda\). When we have computed \(\log p(\lambda \vert x_1,\ldots,x_n)\) over a grid of \(\lambda\) values we compute the posterior on the original scale by \[ p(\lambda^{(j)} \vert x_1,\ldots,x_n) \propto \exp\Big( \log p(x_1,\ldots,x_n \vert \lambda_j) + \log p(\lambda_j) \Big) \] and then divide all numbers with the normalizing constant to make sure that the posterior integrates to one. This is done numerically by approximating the integral by a Riemann rectangle sum \[ p(\lambda^{(j)} \vert x_1,\ldots,x_n) = \frac{\exp\Big( \log p(x_1,\ldots,x_n \vert \lambda^{(j)}) + \log p(\lambda^{(j)}) \Big)} {\sum_{h=1}^J \exp\Big( \log p(x_1,\ldots,x_n \vert \lambda^{(h)}) + \log p(\lambda^{(h)}) \Big) \Delta} \] where \(\Delta\) is the spacing between the grid points of \(\lambda\)-values: \(\lambda^{(1)}, \ldots, \lambda^{(J)}\).

library(tidyverse) # loads data manipulation and visualization packages
library(survival) # loads the lung cancer data as `lung`
colors = c("#6C8EBF", "#c0a34d", "#780000","#007878","#B5C6DF","#EADAAA","#AE6666")

Set up prior hyperparameters

alpha_prior <- 3     # shape parameter
beta_prior <- 1/50   # rate parameter

Set up function that computes the likelihood for any \(\lambda\) value:

# Make a function that computes the likelihood
weibull_loglike <- function(lambda, x, censored, k){
  loglik_uncensored = sum(dweibull(x[-censored], shape = k, scale = lambda, 
                                   log = TRUE))
  loglik_censored = sum(pweibull(x[censored], shape = k, scale = lambda, 
                                 lower.tail = FALSE, log.p = TRUE))
  return(loglik_uncensored + loglik_censored)
}

Set up a function that computes the posterior density over a grid of \(\lambda\):

weibull_posterior <- function(lambdaGrid, x, censored, k, alpha_prior, beta_prior){
  Delta = lambdaGrid[2] - lambdaGrid[1] # Grid step size
  logPrior <- dgamma(lambdaGrid, shape = alpha_prior, rate = beta_prior, log = TRUE)
  logLike <- sapply(lambdaGrid, weibull_loglike, x, censored, k)
  logPost <- logLike + logPrior
  logPost <- logPost - max(logPost) # subtract constant to avoid overflow
  post <- exp(logPost)/(sum(exp(logPost))*Delta) # original scale and normalize
  logLike <- logLike - max(logLike)
  likeNorm <- exp(logLike)/(sum(exp(logLike))*Delta) # normalized likelihood
  return(list(post = post, prior = exp(logPrior), likeNorm = likeNorm))
}

# Plot the prior and posterior densities

lambdaGrid <- seq(200, 800, length.out = 1000)
# Compute to get the prior
postRes <- weibull_posterior(lambdaGrid, lung$time, lung$status == 1, k = 1, 
                             alpha_prior, beta_prior)
df <- data.frame(
  lambdaGrid = lambdaGrid, 
  prior = postRes$prior
)

# Compute for all selected k values
postModes = c()
for (k in c(1, 3/2, 2)){
  postRes <- weibull_posterior(lambdaGrid, lung$time, lung$status == 1, k, alpha_prior, beta_prior)
  df[str_glue("posterior k={k}")] <- postRes$post
  postModes = c(postModes, lambdaGrid[which.max(postRes$post)])
}

df_long <- df %>% pivot_longer(-lambdaGrid, names_to = "density_type", values_to = "density")

# Plot using ggplot2
ggplot(df_long) +
  aes(x = lambdaGrid, y = density, color = density_type) +
  geom_line() +
  xlim(250,600) +
  scale_colour_manual(
    breaks = c("prior", "posterior k=1", "posterior k=1.5", "posterior k=2"), 
    values = c(colors[2], colors[1], colors[3], colors[4])) +
  labs(title = "Exercise 2.3", x = expression(lambda), y = "Density", color = "") + 
  theme_minimal()

Warning: Removed 1668 rows containing missing values or values outside the scale range
(`geom_line()`).

Solution Exercise 2.3b

The fit of the three Weibull models are plotted below. The best fit seems to be for \(k=3/2\), but it is still not very good. In a later exercise you will be asked to freely estimate both \(\lambda\) and \(k\), and even later to fit a Weibull regression model with covariates.

ggplot(lung, aes(time)) +
  geom_histogram(aes(y = after_stat(density), fill = "Data"), bins = 30) +
  stat_function(fun = dweibull, args = list(shape = 1, scale = postModes[1]), lwd = 1, 
                aes(color = "Weibull fit k = 1"),
  ) +
  stat_function(fun = dweibull, args = list(shape = 3/2, scale = postModes[2]), lwd = 1, 
                aes(color = "Weibull fit k = 3/2"),
  ) +
  stat_function(fun = dweibull, args = list(shape = 2, scale = postModes[3]), lwd = 1, 
                aes(color = "Weibull fit k = 2"),
  ) +
  labs(title = "Weibull model fits", x = "days", y = "Density") + 
  scale_fill_manual("", values = colors[6]) +
  scale_color_manual("", values = c(colors[1], colors[3], colors[4])) +
  theme_minimal()

Solution Exercise 2.3c

The code below defines the iid Weibull survival model with censored data in stan. The code here extends this example in the Stan User Guide to the case with different censoring points for each patient. Note the target += construction where the censored data points are added to the target (the log posterior) after the initial uncensored (observed) data are included in the log posterior with the y_obs ~ weibull(k, lambda) statement. The weibull_lccdf function in stan is a convenience function that computes the survival probability \(\mathrm{Pr}(X >= x) = 1 - F(x)\), where \(F()\) is the cdf of the Weibull distribution. There are _lccdf versions of all distribution in stan.

weibull_survivalmodel <- '
data {

  // Data
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  array[N_obs] real y_obs;
  array[N_cens] real y_cens;
  
  // Model setting
  real<lower=0> k;
  
  // Prior hyperparameters theta ~ Gamma(alpha, beta)
  real<lower=0> alpha;
  real<lower=0> beta;
}
parameters {
  real lambda;
}
model {
  lambda ~ gamma(alpha, beta); // specifies the prior
  y_obs ~ weibull(k, lambda);  // add the observed (non-censored) data
  target += weibull_lccdf(y_cens | k, lambda); // add censored. lccdf is 1-cdf
}
'

We set up the data and prior lists that will be supplied to stan:

k = 3/2
y_obs <- lung %>% filter(status == 2) %>% pull(time)
y_cens <- lung %>% filter(status == 1) %>% pull(time)

data <- list(N_obs = length(y_obs), N_cens = length(y_cens), 
             y_obs = y_obs, y_cens = y_cens, k = k)
prior <- list(alpha = alpha_prior, beta = beta_prior)

Load rstan and set some options

#install.packages("rstan", repos = c('https://stan-dev.r-universe.dev', 
#                                    getOption("repos")))
suppressMessages(library(rstan))
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

Sample from the posterior distribution using HMC in stan

nDraws = 5000
fit = stan(model_code = weibull_survivalmodel, data = c(data, prior), iter = nDraws)

Summarize the results from the posterior sampling. The number of effective draws n_eff is not much lower than the \(5000\) nominal number of draws, so the HMC sampling is efficient. The Rhat is also close to one, suggesting that the different runs gave similar results.

s <- summary(fit, pars = "lambda", probs = c(0.025, 0.975))
s$summary  # results from all the different runs (chains) merged.

           mean   se_mean       sd     2.5%    97.5%   n_eff     Rhat
lambda 415.6186 0.3582918 20.68638 376.7969 457.8914 3333.46 1.001112

Compare the posterior from HMC sampling with the gridded version above, as a bug check. Hmm, they should agree. Can’t seem to find the bug now, will fix later. Well, you get the idea.

# Plot histogram from stan draws
postsamples <- extract(fit, pars = c("lambda"))
hist(postsamples$lambda, 50, freq = FALSE, col = colors[5], 
     xlab = expression(lambda), ylab = "posterior density", 
     main = expression(lambda), ylim = c(0,0.025))

# Adding the gridded version from above
lambdaGrid <- seq(200, 800, length.out = 1000)
postRes <- weibull_posterior(lambdaGrid, lung$time, lung$status == 1, k = k, 
                             alpha_prior, beta_prior)
lines(lambdaGrid, postRes$post, col = colors[3], lw = 2)

legend(x = "topright", inset=.05, legend = c("Stan", "Gridded"), lty = c(0,1),
         fill = c(colors[5], NA), border = c(1,0),
         col = c(colors[5], colors[3]), box.lty=1
  )

Exercise 2.4

Let \(X_1,\ldots,X_n\) be an iid sample from a distribution with density function \[ p(x) \propto \theta^2 x \exp (-x\theta)\quad \text{ for } x>0 \text{ and } \theta>0. \] Find the conjugate prior for this distribution and derive the posterior distribution from an iid sample \(x_1,\ldots,x_n\).

Solution Exercise 2.4

The likelihood function from a sample \(x_1,\ldots,x_n\) is

\[ p(x_1,\ldots,x_n \vert \theta) = \prod_{i=1}^n\theta^2 x_i \exp (-x_i\theta) \propto \theta^{2n}\exp\Big(-\theta \sum_{i=1}^n x_i \Big) \]

This likelihood resembles a Gamma distribution, so a good guess for a conjugate prior would be the \(\theta \sim \mathrm{Gamma}(\alpha,\beta)\) distribution; to see that this is indeed a reasonable guess, note that the particular form of the Gamma density (a power of \(\theta\) times an exponential in \(\theta\)) makes it closed under multiplication. The posterior distribution is then

\[ \begin{align} p(\theta|x_1,\ldots,x_n) & \propto p(x_1,\ldots,x_n \vert \theta)p(\theta) \\ & \propto \theta^{2n}\exp\Big(-\theta \sum_{i=1}^n x_i \Big)\theta^{\alpha-1}\exp(-\theta\beta) \\ & \propto \theta^{\alpha + 2n - 1}\exp\Big(-\theta (\beta+\sum_{i=1}^n x_i) \Big) \end{align} \] and the posterior is therefore \(\theta \vert x_1,\ldots,x_n \sim \mathrm{Gamma}(\alpha + 2n,\beta + \sum_{i=1}^n x_i)\). Since the posterior belongs to the same (Gamma) family as the prior, the Gamma prior is indeed conjugate to this likelihood.