Lecture 2.2

The Gaussian Distribution

Learning Objectives

After this lecture you should be able to:

State the univariate Gaussian definition and explain intuitively the role of $\mu$ (location of the peak) and $\sigma^2$ (width of the distribution).
Derive $\mathbb{E}[x] = \mu$ analytically using the substitution $y = \tfrac{x - \mu}{\sqrt{2\sigma^2}}$, the odd-function argument, and the Gaussian integral $\int_{-\infty}^{\infty} e^{-y^2}\,dy = \sqrt{\pi}$.
Derive $\text{Var}[x] = \sigma^2$ using the same substitution and the derivative trick $\int_{-\infty}^{\infty} y^2 e^{-y^2}\,dy = -\tfrac{d}{da}\sqrt{\tfrac{\pi}{a}}\big|_{a=1}$.
State the multivariate Gaussian and identify how $\mu$, $\sigma^2$, and the normalization constant each generalize to the vector/matrix setting.
Explain what the covariance matrix $\boldsymbol{\Sigma}$ controls geometrically in the multivariate case.

The Gaussian (or normal) distribution is the single most important probability distribution in machine learning. It appears in noise models, prior distributions, likelihoods, and closed-form solutions throughout this course. This lecture defines it, explains its parameters, and derives its mean and variance analytically.

1. The Univariate Gaussian

Definition: Univariate Gaussian $$\mathcal{N}(x\,;\,\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$

Parameters: $\mu \in \mathbb{R}$ (mean) and $\sigma^2 > 0$ (variance). The prefactor $\tfrac{1}{\sqrt{2\pi\sigma^2}}$ is the normalization constant that ensures $\int_{-\infty}^{\infty} \mathcal{N}(x;\mu,\sigma^2)\,dx = 1$.

Interpreting the two parameters:

$\mu$: the exponent $(x-\mu)^2$ is minimized (= 0) at $x = \mu$, so the distribution peaks at $\mu$. Moving away from $\mu$ increases the exponent and the density decays to zero.
$\sigma^2$: controls how quickly the density decays. Small $\sigma^2$ → narrow, tall peak (probability mass concentrated near $\mu$). Large $\sigma^2$ → wide, flat distribution (probability spread broadly).

2. Computing the Mean: $\mathbb{E}[x] = \mu$

We compute $\mathbb{E}[x] = \int_{-\infty}^{\infty} x\,\mathcal{N}(x;\mu,\sigma^2)\,dx$ analytically via a change of variables.

Substitution: let $y = \dfrac{x - \mu}{\sqrt{2\sigma^2}}$, so $x = \sqrt{2\sigma^2}\,y + \mu$ and $dx = \sqrt{2\sigma^2}\,dy$. The exponent becomes $e^{-y^2}$. Substituting:

$$\mathbb{E}[x] = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} \!\bigl(\sqrt{2\sigma^2}\,y + \mu\bigr)\, e^{-y^2}\, dy$$

This splits into two integrals:

$$= \frac{\sqrt{2\sigma^2}}{\sqrt{\pi}} \underbrace{\int_{-\infty}^{\infty} y\, e^{-y^2}\, dy}_{= \,0} + \frac{\mu}{\sqrt{\pi}} \underbrace{\int_{-\infty}^{\infty} e^{-y^2}\, dy}_{=\,\sqrt{\pi}}$$

The first integral is zero because $y e^{-y^2}$ is an odd function integrated over a symmetric domain. The second is the standard Gaussian integral $\int_{-\infty}^{\infty} e^{-y^2}\,dy = \sqrt{\pi}$. Therefore:

$$\boxed{\mathbb{E}[x] = \mu}$$

3. Computing the Variance: $\text{Var}[x] = \sigma^2$

We compute $\text{Var}[x] = \int_{-\infty}^{\infty}(x-\mu)^2\,\mathcal{N}(x;\mu,\sigma^2)\,dx$ using the same substitution.

With $y = \tfrac{x-\mu}{\sqrt{2\sigma^2}}$ we have $(x-\mu)^2 = 2\sigma^2 y^2$, and after substituting and simplifying the prefactors:

$$\text{Var}[x] = \frac{2\sigma^2}{\sqrt{\pi}} \int_{-\infty}^{\infty} y^2\, e^{-y^2}\, dy$$

To evaluate $\int_{-\infty}^{\infty} y^2 e^{-y^2}\,dy$ we use a derivative trick. Start from the known result $\int_{-\infty}^{\infty} e^{-ay^2}\,dy = \sqrt{\pi/a}$ and differentiate both sides with respect to $a$:

$$-\int_{-\infty}^{\infty} y^2\, e^{-ay^2}\, dy = -\frac{\sqrt{\pi}}{2}\, a^{-3/2}$$

Setting $a = 1$:

$$\int_{-\infty}^{\infty} y^2\, e^{-y^2}\, dy = \frac{\sqrt{\pi}}{2}$$

Substituting back:

$$\text{Var}[x] = \frac{2\sigma^2}{\sqrt{\pi}} \cdot \frac{\sqrt{\pi}}{2}$$ $$\boxed{\text{Var}[x] = \sigma^2}$$

This confirms that $\mu$ and $\sigma^2$ are not just names — they are literally the mean and variance of the distribution.

A note on mathematical practice. These derivations are exercises in the change-of-variables technique and the Gaussian integral. Throughout this course many results are stated without full proof. The expectation is not that you memorize every derivation, but that you trust the results because you have seen at least one example worked out from first principles — and ideally verify others yourself.

4. The Multivariate Gaussian

When $\mathbf{x} \in \mathbb{R}^D$ is a random vector, the Gaussian generalizes to:

Definition: Multivariate Gaussian $$\mathcal{N}(\mathbf{x}\,;\,\boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{\sqrt{(2\pi)^D |\boldsymbol{\Sigma}|}}\exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right)$$

Parameters: $\boldsymbol{\mu} \in \mathbb{R}^D$ (mean vector) and $\boldsymbol{\Sigma} \in \mathbb{R}^{D \times D}$ (covariance matrix, symmetric positive definite). $|\boldsymbol{\Sigma}|$ denotes the determinant of $\boldsymbol{\Sigma}$.

The generalization from univariate to multivariate is direct:

$(x - \mu)^2 / \sigma^2$ → $(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})$: a squared Mahalanobis distance that measures how far $\mathbf{x}$ is from $\boldsymbol{\mu}$ in a metric defined by $\boldsymbol{\Sigma}^{-1}$.
$\sigma^2$ → $\boldsymbol{\Sigma}$: the covariance matrix controls the shape. A diagonal $\boldsymbol{\Sigma}$ produces axis-aligned ellipsoidal contours; a full $\boldsymbol{\Sigma}$ can rotate them. This is $\text{Cov}[\mathbf{x}]$ — the covariance of $\mathbf{x}$ with itself.
$\sqrt{2\pi\sigma^2}$ → $\sqrt{(2\pi)^D |\boldsymbol{\Sigma}|}$: the determinant generalizes the scalar variance in the normalization.

Mean of the multivariate Gaussian. Using the substitution $\mathbf{y} = \mathbf{x} - \boldsymbol{\mu}$, the integral for $\mathbb{E}[\mathbf{x}]$ splits into an odd-function term (integrates to zero by symmetry) and a constant $\boldsymbol{\mu}$ multiplied by the integral of the normalized distribution (= 1). Therefore $\mathbb{E}[\mathbf{x}] = \boldsymbol{\mu}$ — the same argument as in the univariate case, lifted to vectors. The proof that $\text{Cov}[\mathbf{x}] = \boldsymbol{\Sigma}$ follows by a similar but more involved calculation; see Bishop §2.3.