Statistics with Julia

Author

Mattias Villani

Statistical distributions

Julia has a fantastic unified system for statistical distributions, implemented mainly in the Distributions.jl package.
Similar to Python, but different from R, distributions are objects.
- Normal(2,4) is a \(N(2,4^2)\) distribution object.
- Poisson(4) is a \(\mathrm{Poisson}(4)\) distribution object.
```
using Distributions
dist = Normal(1,3)
```
```
Normal{Float64}(μ=1.0, σ=3.0)
```
We can call functions on distribution objects.
Evaluate the pdf at a point \(x=0\):
```
pdf(dist, 0)
```
```
0.12579440923099772
```
Evaluate the cumulative distribution function (cdf)
```
cdf(dist, 0) 
```
```
0.36944134018176367
```

Compute quantiles

quantile(dist, [0.025, 0.5, 0.975])

3-element Vector{Float64}:
 -4.879891953620177
  1.0
  6.879891953620173

Generate random numbers

rand(dist, 10)

10-element Vector{Float64}:
  2.133840009274354
  3.7514097146843053
 -2.980432916964554
  4.056663153113082
 -2.8383892115337233
  1.334808378644338
 -4.0041927562895845
  1.3005483572613565
  5.558396602184795
  3.6836291874390166

Location-Scale families are generated by addition and multiplication

using Distributions
1 + 2*TDist(3)

LocationScale{Int64, Continuous, TDist{Float64}}(
μ: 1
σ: 2
ρ: TDist{Float64}(ν=3.0)
)

so one can easily define location-scale version

TDist(μ, σ, ν) = μ + σ*TDist(ν)
dist = TDist(1, 2, 3)
pdf(dist, 0)

0.1565904555044143

Mixtures can be built up from component distributions with MixtureModel

# Mixture of two normals
mixdist = MixtureModel([Normal(0, 1), Normal(5, 3)], [0.3, 0.7])
pdf(mixdist, 1.0)
quantile(mixdist, [0.025, 0.5, 0.975])

3-element Vector{Float64}:
 -1.628795787867508
  3.303956200415996
 10.408229272202913

# Any number of mixture components, also discrete.
mixdist = MixtureModel([Poisson(1), NegativeBinomial(2,0.3), Geometric(0.4)], [0.1, 0.2, 0.7])
pdf(mixdist, 4)

0.059429831004881015

Truncation

using Plots
# Normal(0,1) truncated to [-1,1]
dist = Truncated(Normal(0,1), -1, 1)
xgrid = range(quantile(dist, 0.001), quantile(dist, 0.999), length = 500)
plot(xgrid, pdf.(dist, xgrid), lw = 2)

# Gamma truncated from below at 1
dist = Truncated(Gamma(2,2), 1, Inf)
xgrid = range(0, quantile(dist, 0.999), length = 500)
plot(xgrid, pdf.(dist, xgrid), lw = 2)

Censoring

# Censored Weibull
cdfval = cdf(Weibull(2,2),3)
println("cdf at x=3 of Weibull without truncation is $(cdfval)")
dist = censored(Weibull(2,2), upper = 3)
println("cdf at x=3 of Weibull with truncation at x=3 is $(cdf(dist,3))")
println("pdf at x=3 of Weibull with truncation at x=3 is $(pdf(dist,3))")

cdf at x=3 of Weibull without truncation is 0.8946007754381357
cdf at x=3 of Weibull with truncation at x=3 is 1.0
pdf at x=3 of Weibull with truncation at x=3 is 0.10539922456186433

We can plot this by adding the extra point mass at the truncation point:

dist = censored(Weibull(2,2), upper = 3)
plot(0:0.01:4, pdf.(dist, 0:0.01:4), lw = 2)

# Point mass
plot!([3, 3], [0, pdf(dist,3)], lw=3, color=:indianred, legend=false)
scatter!([3], [pdf(dist,3)], m = :circle, mc = :indianred, ms = 8, 
  lc = :indianred, lw=2)

Optimization and Autodiff

The Optim.jl package is the main package for numerical function optimization.

As a simple example, consider finding the maximum likelihood estimate in Poisson regression with the BFGS optimizer.

First, let us simulate some data from a Poisson regression model with an intercept and three covariates:

using Plots, Distributions, LinearAlgebra, Optim, ForwardDiff, Random, LaTeXStrings

# Generate data from Poisson regression with β = [1,-1,1,-1]
n = 500
p = 4
X = [ones(n,1) randn(n,p-1)]
β = 0.5*[1,-1,1,-1]
λ = exp.(X*β)
y = rand.(Poisson.(λ));
scatter(X[:,2], y, title = "Scatter of y against the first covariate", 
  xlabel = L"x_1", ylabel = L"y")

To find the ML estimate we need to define the log-likelihood function:

# Setting up the log likelihood function for Poisson regression
function poisreg_loglik(β, y, X)                        
    return sum(logpdf.(Poisson.(exp.(X*β)), y))
end
poisreg_loglik(β, y, X) # Test drive the function to see that it works.

-801.1549598275437

We can now use the maximize function to find the ML estimate:

the algorithm starts at the random initial value \(\beta_0\).
the first argument in the function call is the log-likelihood function as an anonymous function of the single argument, the vector \(\beta\).
the argument autodiff = :forward tells Julia to use automatic differentiation from the ForwardDiff.jl package to find the gradient vector, which is then used in the algorithm for finding the maximum.
the output from the maximize function is an object with information about the optimization result. The field maximizer contains the \(\beta\) that maximizes the log-likelihood function.

# Find the MLE of β using Optim.jl
β₀ = randn(p) # Initial guess for the optimization
optres = maximize(β -> poisreg_loglik(β, y, X), β₀, BFGS(), 
  autodiff = :forward)
βmle = Optim.maximizer(optres)

4-element Vector{Float64}:
  0.4997494654733283
 -0.495306992311452
  0.49073110848704
 -0.5209075829242628

We can approximate the standard errors of the ML estimator using the Hessian of the log-likelihood. This matrix of second partial derivatives can also be obtained from automatic differentiation. The diagonal elements of the negative inverse Hessian approximates the variance of the individual \(\hat\beta_j\) estimates:

# Compute Hessian to get approximate standard errors
H(β) = ForwardDiff.hessian(β -> poisreg_loglik(β, y, X), β)
Ωᵦ = Symmetric(-inv(H(βmle))) # This approximates the covariance matrix of MLE
diag(Ωᵦ) # Diagonal elements are the variances of the MLEs
se = sqrt.(diag(Ωᵦ)) # Standard errors of the MLEs
println("βmle: \n", round.(βmle, digits = 3))
println("Standard errors: \n", round.(se, digits = 3))

βmle: 
[0.5, -0.495, 0.491, -0.521]
Standard errors: 
[0.038, 0.03, 0.024, 0.028]

Note how the ForwardDiff.jl package computes the Hessian matrix as function, which can then be rapidly evaluated for any \(\beta\) vector. Similarly, we can obtain the gradient as an explicit function and check that it is (numerically close to) the zero vector at the ML estimate:

∂loglik(β) = ForwardDiff.gradient(β -> poisreg_loglik(β, y, X), β)
∂loglik(β₀) # Gradient at the initial value, just because we can.
∂loglik(βmle) # Gradient at the MLE, should be close to zero

4-element Vector{Float64}:
  5.9190430334865596e-12
 -7.505773780280833e-12
  2.4567015088905464e-12
 -1.0436096431476471e-14