The Energy Score for Multivariate Normal Distributions

12 Feb 2025

Introduction

The Energy Score serves as the multivariate extension of the Continuous Ranked Probability Score (CRPS), maintaining its desirable properties while handling vector-valued forecasts. This post presents the closed-form expression for multivariate normal distributions and its connection to classical CRPS results.

Insight:

The Energy Score generalizes CRPS through expectations of pairwise distances, avoiding direct integration over multivariate CDFs

Special function relationships enable closed-form solutions for normal distributions

Energy Score: Definition

For a $d$-dimensional forecast distribution $P$ and observation vector $\mathbf{y} \in \mathbb{R}^d$:

\[\text{ES}(P, \mathbf{y}) = \mathbb{E}\left[\|\mathbf{X} - \mathbf{y}\|\right] - \frac{1}{2} \mathbb{E}\left[\|\mathbf{X} - \mathbf{X'}\|\right]\]

where $\mathbf{X}, \mathbf{X’} \sim P$ are independent random vectors.

The Multivariate Normal Case

For $P = \mathcal{N}_d(\boldsymbol{\mu}, \mathbf{\Sigma})$ and observation $\mathbf{y}$:

\[\text{ES}(P, \mathbf{y}) = \sqrt{2\text{tr}(\mathbf{\Sigma})} \cdot \frac{\Gamma\left(\frac{d+1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)} \left[ {}_1F_1\left(-\frac{1}{2}; \frac{d}{2}; -\frac{\delta^2}{2}\right) - 1 \right]\]

Components:

$\delta^2 = (\boldsymbol{\mu} - \mathbf{y})^\top \mathbf{\Sigma}^{-1} (\boldsymbol{\mu} - \mathbf{y})$ (Mahalanobis distance squared)
${}_1F_1$: Confluent hypergeometric function
$\Gamma$: Gamma function

Practical Note:

Requires special function implementations (e.g., scipy.special.hyp1f1)

Reduces to familiar CRPS when $d=1$

Special Cases

Isotropic Covariance

When $\mathbf{\Sigma} = \sigma^2\mathbf{I}_d$:

\[\text{ES}(P, \mathbf{y}) = \sqrt{2d}\sigma \cdot \frac{\Gamma\left(\frac{d+1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)} \left[ {}_1F_1\left(-\frac{1}{2}; \frac{d}{2}; -\frac{\|\boldsymbol{\mu} - \mathbf{y}\|^2}{2\sigma^2}\right) - 1 \right]\]

Univariate Case (Classical CRPS)

For $P = \mathcal{N}(\mu, \sigma^2)$ and observation $y$:

\[\text{CRPS}(P, y) = \sigma \left[ z(2\Phi(z) - 1) + \sqrt{\frac{2}{\pi}}(e^{-z^2/2} - 1) \right]\]

where $z = \frac{y - \mu}{\sigma}$ and $\Phi$ is the standard normal CDF.

Computational Approaches

Component	Implementation Strategy
${}_1F_1$ function	scipy.special.hyp1f1
Gamma functions	Standard math libraries
High-dimensional cases	Monte Carlo sampling

Implementation Suggestion:

For d > 3, consider Monte Carlo approximation:

$$ \text{ES}(P, \mathbf{y}) \approx \frac{1}{N}\sum_{i=1}^N \|\mathbf{X}_i - \mathbf{y}\| - \frac{1}{2N^2}\sum_{i,j=1}^N \|\mathbf{X}_i - \mathbf{X}_j\| $$

Proof Sketch

Key Steps

Non-central Chi Distribution: $|\mathbf{X}-\mathbf{y}|$ follows a non-central chi distribution
Central Chi Moments: $\mathbb{E}[|\mathbf{X}-\mathbf{X’}|]$ uses central chi properties
Hypergeometric Connection: Relate Bessel functions to ${}_1F_1$ through series expansions

Technical Lemma

For $\mathbf{V} \sim \mathcal{N}_d(\boldsymbol{\nu}, \mathbf{\Omega})$:

\[\mathbb{E}[\|\mathbf{V}\|] = \sqrt{2\text{tr}(\mathbf{\Omega})} \cdot \frac{\Gamma\left(\frac{d+1}{2}\right)}{\Gamma\left(\frac{d}{2}\right)} \cdot {}_1F_1\left(-\frac{1}{2}; \frac{d}{2}; -\frac{\boldsymbol{\nu}^\top \mathbf{\Omega}^{-1} \boldsymbol{\nu}}{2}\right)\]

Comments

The Energy Score’s extension of CRPS to multivariate settings addresses a critical need in modern probabilistic forecasting. Where the univariate CRPS revolutionized verification of scalar forecasts [1], complex systems increasingly demand joint calibration assessments of vector-valued predictions - from weather models (temperature-pressure-wind vectors) to financial risk (correlated asset returns).

Some Motivations:

Dependency Awareness: Unlike marginal CRPS averaging, the Energy Score's pairwise distance terms (𝔼‖𝐗−𝐗'‖) directly penalize misrepresented correlations [2]

Properness Preservation: Maintains the CRPS' crucial property of being strictly proper - forecasters can't game the system by misrepresenting uncertainties [1]

Consistency: Reduces exactly to CRPS when 𝑑=1, ensuring backward compatibility

While computation requires special functions (${}_1F_1$, Γ) or Monte Carlo methods, this cost reflects the intrinsic complexity of multivariate dependence structures. As shown in [3], alternative scores like the variogram score make different tradeoffs, but the Energy Score remains uniquely tied to the CRPS.

Implementation Insight:

For high-𝑑 systems, the hypergeometric term ${}_1F_1(−1/2;𝑑/2;−𝛿^2/2)$ approaches $e^{−𝛿^2/(2𝑑)}$ - suggestion some interesting connections to Gaussian kernels in RKHS theory [2]

As multivariate probabilistic AI/ML systems proliferate, the Energy Score provides a principled verification framework - one that honors the CRPS’ “properness” philosophy.

References

Antonio Aguirre Data Modelling·Bayesian Statistics·Machine Learning