Lecture 7.1

Classification With Basis Functions

Basis functions allow linear classifiers to operate in nonlinear feature spaces, turning otherwise inseparable problems into linearly separable ones — but at the cost of requiring hand-designed features. This lecture motivates moving toward learned basis functions.

Learning Objectives

Explain how Gaussian basis functions can linearize a nonlinear classification problem.
State the two-step pipeline: nonlinear feature map $\boldsymbol{\phi}(\mathbf{x})$, followed by a linear classifier on $\boldsymbol{\phi}$.
List the advantages of basis functions for classification.
Identify the limitations that motivate learning basis functions via neural networks.

1. The Problem with Linear Classifiers on Raw Inputs

A linear classifier partitions input space with a hyperplane. If the true decision boundary is nonlinear — for example, one class forms a central cluster surrounded by another — no linear boundary can correctly separate them. Basis functions solve this by remapping the data to a new feature space where the classes are linearly separable.

Gaussian Basis Functions

Place Gaussian basis functions centered at cluster locations $\boldsymbol{\mu}_1$ and $\boldsymbol{\mu}_2$:

$$\phi_m(\mathbf{x}) = \exp\!\Bigl(-\tfrac{1}{2}\|\mathbf{x} - \boldsymbol{\mu}_m\|^2\Bigr).$$

Each $\phi_m(\mathbf{x})$ measures the affinity of $\mathbf{x}$ to center $\boldsymbol{\mu}_m$: it is close to 1 when $\mathbf{x}$ is near the center and decays toward 0 as distance grows. By mapping $\mathbf{x} \mapsto (\phi_1(\mathbf{x}), \phi_2(\mathbf{x}))$, the two clusters map to distinct regions in feature space, enabling a simple linear separator.

2. The Two-Step Pipeline

Basis Function Classification Pipeline

Feature extraction: Map each input $\mathbf{x}$ to a feature vector $\boldsymbol{\phi}(\mathbf{x}) = (\phi_0(\mathbf{x}), \phi_1(\mathbf{x}), \dots, \phi_{M-1}(\mathbf{x}))^\top$, where $\phi_0 \equiv 1$ incorporates the bias.
Linear classification: Apply any linear classifier (logistic regression, LDA, perceptron, etc.) in feature space. The full pipeline is nonlinear in $\mathbf{x}$ but linear in $\mathbf{w}$ — a generalized linear model.

All methods developed for linear models (MLE, Bayesian treatment, SGD, IRLS) apply unchanged once features are extracted.

3. Advantages of Basis Functions

Nonlinear models from linear machinery. A linear classifier in $\boldsymbol{\phi}$-space corresponds to a nonlinear classifier in $\mathbf{x}$-space.
Analytical and Bayesian solutions remain tractable. Because the model is linear in $\mathbf{w}$, least-squares and Bayesian posteriors retain closed forms.
Proven effectiveness. Many problems that are not linearly separable in raw input space become separable after a well-chosen feature map.

4. Limitations

Fixed, handcrafted features. The basis functions are chosen before training and do not adapt to the data. In low dimensions, visualization guides the choice; in high dimensions, it becomes very difficult.
Curse of dimensionality. To cover a $d$-dimensional input space, the number of basis functions needed grows exponentially with $d$, making the approach intractable in high dimensions.

Motivation for Neural Networks

Both limitations — fixed features and exponential scaling — are addressed by learning the basis functions as part of training. Multi-layer perceptrons (Lectures 8.1–8.5) parameterize the feature map and optimize it end-to-end alongside the classifier weights.