Lecture 12.1

Properties of Gaussian Distributions

Gaussian distributions have algebraic closure properties — marginals, conditionals, and sums are all Gaussian — that underpin exact Bayesian inference and, in the next lecture, Gaussian processes.

Learning Objectives
  • State the marginalization property: marginals of a joint Gaussian are Gaussian.
  • State the conditioning property: conditionals of a joint Gaussian are Gaussian, with computable mean and covariance.
  • State that the sum of two independent Gaussians is Gaussian.
  • Explain the reparameterization trick for sampling from correlated Gaussians.

1. Jointly Gaussian Random Variables

Suppose $\mathbf{x}_1$ and $\mathbf{x}_2$ are jointly Gaussian distributed:

$$\begin{pmatrix}\mathbf{x}_1 \\ \mathbf{x}_2\end{pmatrix} \sim \mathcal{N}\!\left(\begin{pmatrix}\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2\end{pmatrix},\; \begin{pmatrix}\boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22}\end{pmatrix}\right).$$

The block structure of the covariance matrix encodes both marginal variances ($\boldsymbol{\Sigma}_{11}$, $\boldsymbol{\Sigma}_{22}$) and cross-covariances ($\boldsymbol{\Sigma}_{12} = \boldsymbol{\Sigma}_{21}^\top$).

2. Marginalization Property

Marginalization

Integrating out $\mathbf{x}_2$ from the joint Gaussian gives a Gaussian marginal for $\mathbf{x}_1$:

$$p(\mathbf{x}_1) = \mathcal{N}(\mathbf{x}_1 \mid \boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11}),$$

and similarly $p(\mathbf{x}_2) = \mathcal{N}(\mathbf{x}_2 \mid \boldsymbol{\mu}_2, \boldsymbol{\Sigma}_{22})$. The marginal reads off the corresponding block of the joint mean and covariance directly.

3. Conditioning Property

Gaussian Conditioning

Given a fixed value of $\mathbf{x}_2$, the conditional distribution of $\mathbf{x}_1$ is also Gaussian:

$$p(\mathbf{x}_1 \mid \mathbf{x}_2) = \mathcal{N}\bigl(\mathbf{x}_1 \mid \boldsymbol{\mu}_{1|2},\, \boldsymbol{\Sigma}_{1|2}\bigr),$$

where

$$\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2),$$ $$\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}.$$

The conditional mean shifts from $\boldsymbol{\mu}_1$ by an amount proportional to how much $\mathbf{x}_2$ differs from its mean. The conditional covariance $\boldsymbol{\Sigma}_{1|2}$ is always smaller than the marginal $\boldsymbol{\Sigma}_{11}$: observing $\mathbf{x}_2$ reduces uncertainty about $\mathbf{x}_1$.

This conditioning property is the engine behind Gaussian process regression (Lecture 12.5).

4. Sum of Independent Gaussians

Sum Property

If $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ and $\mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu}', \boldsymbol{\Sigma}')$ are independent, then

$$\mathbf{x} + \mathbf{y} \sim \mathcal{N}(\boldsymbol{\mu} + \boldsymbol{\mu}',\; \boldsymbol{\Sigma} + \boldsymbol{\Sigma}').$$

Means and covariances add. Independence is essential: correlated variables do not follow this simple rule.

5. Sampling Correlated Gaussians: The Reparameterization Trick

To sample $\mathbf{x} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ when only an uncorrelated sampler is available:

  1. Factorize $\boldsymbol{\Sigma} = \mathbf{A}\mathbf{A}^\top$ via Cholesky decomposition or eigendecomposition ($\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}^{1/2}$).
  2. Sample $\boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$.
  3. Set $\mathbf{x} = \boldsymbol{\mu} + \mathbf{A}\boldsymbol{\varepsilon}$.
Why It Works

$\mathbf{A}\boldsymbol{\varepsilon}$ is a linear transform of a Gaussian, so it is Gaussian with mean $\mathbf{0}$ and covariance $\mathbf{A}\mathbf{I}\mathbf{A}^\top = \boldsymbol{\Sigma}$. This reparameterization trick also underlies the Gaussian process sampling procedure in Lecture 12.4.