using Distributions
= Normal(1,3) dist
Normal{Float64}(μ=1.0, σ=3.0)
Mattias Villani
Julia has a fantastic unified system for statistical distributions, implemented mainly in the Distributions.jl
package.
Similar to Python, but different from R, distributions are objects.
Normal(2,4)
is a \(N(2,4^2)\) distribution object.
Poisson(4)
is a \(\mathrm{Poisson}(4)\) distribution object.
We can call functions on distribution objects.
Evaluate the pdf at a point \(x=0\):
Evaluate the cumulative distribution function (cdf)
Compute quantiles
Generate random numbers
Location-Scale families are generated by addition and multiplication
LocationScale{Int64, Continuous, TDist{Float64}}(
μ: 1
σ: 2
ρ: TDist{Float64}(ν=3.0)
)
so one can easily define location-scale version
MixtureModel
# Mixture of two normals
mixdist = MixtureModel([Normal(0, 1), Normal(5, 3)], [0.3, 0.7])
pdf(mixdist, 1.0)
quantile(mixdist, [0.025, 0.5, 0.975])
3-element Vector{Float64}:
-1.628795787867508
3.303956200415996
10.408229272202913
# Any number of mixture components, also discrete.
mixdist = MixtureModel([Poisson(1), NegativeBinomial(2,0.3), Geometric(0.4)], [0.1, 0.2, 0.7])
pdf(mixdist, 4)
0.059429831004881015
using Plots
# Normal(0,1) truncated to [-1,1]
dist = Truncated(Normal(0,1), -1, 1)
xgrid = range(quantile(dist, 0.001), quantile(dist, 0.999), length = 500)
plot(xgrid, pdf.(dist, xgrid), lw = 2)
# Gamma truncated from below at 1
dist = Truncated(Gamma(2,2), 1, Inf)
xgrid = range(0, quantile(dist, 0.999), length = 500)
plot(xgrid, pdf.(dist, xgrid), lw = 2)
Censoring
# Censored Weibull
cdfval = cdf(Weibull(2,2),3)
println("cdf at x=3 of Weibull without truncation is $(cdfval)")
dist = censored(Weibull(2,2), upper = 3)
println("cdf at x=3 of Weibull with truncation at x=3 is $(cdf(dist,3))")
println("pdf at x=3 of Weibull with truncation at x=3 is $(pdf(dist,3))")
cdf at x=3 of Weibull without truncation is 0.8946007754381357
cdf at x=3 of Weibull with truncation at x=3 is 1.0
pdf at x=3 of Weibull with truncation at x=3 is 0.10539922456186433
We can plot this by adding the extra point mass at the truncation point:
The Optim.jl package is the main package for numerical function optimization.
As a simple example, consider finding the maximum likelihood estimate in Poisson regression with the BFGS optimizer.
First, let us simulate some data from a Poisson regression model with an intercept and three covariates:
using Plots, Distributions, LinearAlgebra, Optim, ForwardDiff, Random, LaTeXStrings
# Generate data from Poisson regression with β = [1,-1,1,-1]
n = 500
p = 4
X = [ones(n,1) randn(n,p-1)]
β = 0.5*[1,-1,1,-1]
λ = exp.(X*β)
y = rand.(Poisson.(λ));
scatter(X[:,2], y, title = "Scatter of y against the first covariate",
xlabel = L"x_1", ylabel = L"y")
To find the ML estimate we need to define the log-likelihood function:
# Setting up the log likelihood function for Poisson regression
function poisreg_loglik(β, y, X)
return sum(logpdf.(Poisson.(exp.(X*β)), y))
end
poisreg_loglik(β, y, X) # Test drive the function to see that it works.
-801.1549598275437
We can now use the maximize
function to find the ML estimate:
the algorithm starts at the random initial value \(\beta_0\).
the first argument in the function call is the log-likelihood function as an anonymous function of the single argument, the vector \(\beta\).
the argument autodiff = :forward
tells Julia to use automatic differentiation from the ForwardDiff.jl
package to find the gradient vector, which is then used in the algorithm for finding the maximum.
the output from the maximize
function is an object with information about the optimization result. The field maximizer
contains the \(\beta\) that maximizes the log-likelihood function.
# Find the MLE of β using Optim.jl
β₀ = randn(p) # Initial guess for the optimization
optres = maximize(β -> poisreg_loglik(β, y, X), β₀, BFGS(),
autodiff = :forward)
βmle = Optim.maximizer(optres)
4-element Vector{Float64}:
0.4997494654733283
-0.495306992311452
0.49073110848704
-0.5209075829242628
We can approximate the standard errors of the ML estimator using the Hessian of the log-likelihood. This matrix of second partial derivatives can also be obtained from automatic differentiation. The diagonal elements of the negative inverse Hessian approximates the variance of the individual \(\hat\beta_j\) estimates:
# Compute Hessian to get approximate standard errors
H(β) = ForwardDiff.hessian(β -> poisreg_loglik(β, y, X), β)
Ωᵦ = Symmetric(-inv(H(βmle))) # This approximates the covariance matrix of MLE
diag(Ωᵦ) # Diagonal elements are the variances of the MLEs
se = sqrt.(diag(Ωᵦ)) # Standard errors of the MLEs
println("βmle: \n", round.(βmle, digits = 3))
println("Standard errors: \n", round.(se, digits = 3))
βmle:
[0.5, -0.495, 0.491, -0.521]
Standard errors:
[0.038, 0.03, 0.024, 0.028]
Note how the ForwardDiff.jl
package computes the Hessian matrix as function, which can then be rapidly evaluated for any \(\beta\) vector. Similarly, we can obtain the gradient as an explicit function and check that it is (numerically close to) the zero vector at the ML estimate:
∂loglik(β) = ForwardDiff.gradient(β -> poisreg_loglik(β, y, X), β)
∂loglik(β₀) # Gradient at the initial value, just because we can.
∂loglik(βmle) # Gradient at the MLE, should be close to zero
4-element Vector{Float64}:
5.9190430334865596e-12
-7.505773780280833e-12
2.4567015088905464e-12
-1.0436096431476471e-14