Lecture 12.1

Properties of Gaussian Distributions

Gaussian distributions have algebraic closure properties — marginals, conditionals, and sums are all Gaussian — that underpin exact Bayesian inference and, in the next lecture, Gaussian processes.

Learning Objectives

State the marginalization property: marginals of a joint Gaussian are Gaussian.
State the conditioning property: conditionals of a joint Gaussian are Gaussian, with computable mean and covariance.
State that the sum of two independent Gaussians is Gaussian.
Explain the reparameterization trick for sampling from correlated Gaussians.

1. Jointly Gaussian Random Variables

Suppose $\mathbf{x}_1$ and $\mathbf{x}_2$ are jointly Gaussian distributed:

$$\begin{pmatrix}\mathbf{x}_1 \\ \mathbf{x}_2\end{pmatrix} \sim \mathcal{N}\!\left(\begin{pmatrix}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{pmatrix},\; \begin{pmatrix}\boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22}\end{pmatrix}\right).$$

The block structure of the covariance matrix encodes both marginal variances ($\boldsymbol{\Sigma}_{11}$, $\boldsymbol{\Sigma}_{22}$) and cross-covariances ($\boldsymbol{\Sigma}_{12} = \boldsymbol{\Sigma}_{21}^\top$).

2. Marginalization Property

Marginalization

Integrating out $\mathbf{x}_2$ from the joint Gaussian gives a Gaussian marginal for $\mathbf{x}_1$:

$$p(\mathbf{x}_1) = \mathcal{N}(\mathbf{x}_1 \mid \boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}),$$

and similarly $p(\mathbf{x}_2) = \mathcal{N}(\mathbf{x}_2 \mid \boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22})$. The marginal reads off the corresponding block of the joint mean and covariance directly.

3. Conditioning Property

Gaussian Conditioning

Given a fixed value of $\mathbf{x}_2$, the conditional distribution of $\mathbf{x}_1$ is also Gaussian:

$$p(\mathbf{x}_1 \mid \mathbf{x}_2) = \mathcal{N}\bigl(\mathbf{x}_1 \mid \boldsymbol{\mu}_{1|2},\, \boldsymbol{\Sigma}_{1|2}\bigr),$$

where

$$\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),$$ $$\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}.$$

The conditional mean shifts from $\boldsymbol{\mu}_1$ by an amount proportional to how much $\mathbf{x}_2$ differs from its mean. The conditional covariance $\boldsymbol{\Sigma}_{1|2}$ is always smaller than the marginal $\boldsymbol{\Sigma}_{11}$: observing $\mathbf{x}_2$ reduces uncertainty about $\mathbf{x}_1$.

This conditioning property is the engine behind Gaussian process regression (Lecture 12.5).

4. Sum of Independent Gaussians

Sum Property

If $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and $\mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu}', \boldsymbol{\Sigma}')$ are independent, then

$$\mathbf{x} + \mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu} + \boldsymbol{\mu}',\; \boldsymbol{\Sigma} + \boldsymbol{\Sigma}').$$

Means and covariances add. Independence is essential: correlated variables do not follow this simple rule.

5. Sampling Correlated Gaussians: The Reparameterization Trick

To sample $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ when only an uncorrelated sampler is available:

Factorize $\boldsymbol{\Sigma} = \mathbf{A}\mathbf{A}^\top$ via Cholesky decomposition or eigendecomposition ($\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}^{1/2}$).
Sample $\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$.
Set $\mathbf{x} = \boldsymbol{\mu} + \mathbf{A}\boldsymbol{\varepsilon}$.

Why It Works

$\mathbf{A}\boldsymbol{\varepsilon}$ is a linear transform of a Gaussian, so it is Gaussian with mean $\mathbf{0}$ and covariance $\mathbf{A}\mathbf{I}\mathbf{A}^\top = \boldsymbol{\Sigma}$. This reparameterization trick also underlies the Gaussian process sampling procedure in Lecture 12.4.