Lecture 11.4

Intermezzo: The Dual Lagrangian

Solving the SVM requires optimization under inequality constraints. The key tool is the Lagrangian dual: converting the primal constrained problem into an equivalent unconstrained dual problem whose solution also solves the original.

Learning Objectives

Distinguish equality-constrained from inequality-constrained optimization.
State the Karush–Kuhn–Tucker (KKT) conditions and interpret complementary slackness.
Define the primal and dual Lagrangians for an inequality-constrained problem.
Explain weak versus strong duality, and state when solving the dual solves the primal.

1. Inequality-Constrained Optimization

We want to maximize $f(\mathbf{x})$ subject to $g(\mathbf{x}) \geq 0$. Two cases arise:

Interior optimum: the unconstrained maximum satisfies $g(\mathbf{x}^*) > 0$. The constraint is inactive and the solution is simply the unconstrained optimum.
Boundary optimum: the unconstrained maximum violates $g(\mathbf{x}) \geq 0$. The constrained solution lies on the boundary $g(\mathbf{x}^*) = 0$, and both $\nabla f$ and $\nabla g$ must be anti-parallel (neither can be improved along the boundary).

2. KKT Conditions

Karush–Kuhn–Tucker (KKT) Conditions

For the problem $\max_\mathbf{x} f(\mathbf{x})$ subject to $g(\mathbf{x}) \geq 0$, introduce Lagrange multiplier $\mu \geq 0$. At the optimum:

Stationarity: $\nabla_\mathbf{x} \bigl[f(\mathbf{x}) + \mu\, g(\mathbf{x})\bigr] = \mathbf{0}$.
Primal feasibility: $g(\mathbf{x}) \geq 0$.
Dual feasibility: $\mu \geq 0$.
Complementary slackness: $\mu\, g(\mathbf{x}) = 0$ — at least one of $\mu$ or $g(\mathbf{x})$ must be zero.

Complementary slackness captures the two cases: if $g(\mathbf{x}) > 0$ (interior), then $\mu = 0$ (constraint is inactive); if $\mu > 0$ (boundary optimum), then $g(\mathbf{x}) = 0$.

3. Primal and Dual Lagrangians

Primal and Dual Lagrangian

The primal Lagrangian (for a minimization problem $\min_\mathbf{x} f(\mathbf{x})$ s.t. $g(\mathbf{x}) \geq 0$) is

$$\mathcal{L}(\mathbf{x}, \mu) = f(\mathbf{x}) - \mu\, g(\mathbf{x}), \quad \mu \geq 0.$$

The dual Lagrangian eliminates $\mathbf{x}$ by taking the minimum over the primal variable:

$$\ell(\mu) = \min_\mathbf{x}\, \mathcal{L}(\mathbf{x}, \mu).$$

This is done by (1) finding the stationary point $\partial \mathcal{L}/\partial \mathbf{x} = 0$, (2) expressing $\mathbf{x}$ in terms of $\mu$, and (3) substituting back. The result $\ell(\mu)$ depends only on $\mu$.

The dual problem is then $\max_{\mu \geq 0}\, \ell(\mu)$.

4. Weak vs. Strong Duality

The dual Lagrangian always provides an upper bound on the primal objective: $\ell(\mu) \geq f(\mathbf{x}^*)$ for all feasible $\mathbf{x}^*$ (weak duality). Minimizing $\ell(\mu)$ gives the tightest such upper bound.

Strong Duality

For convex optimization problems (such as the SVM), the duality gap is zero: the minimizer of the dual equals the optimal primal objective. Solving the dual therefore solves the primal. The primal optimum $\mathbf{x}^*$ is recovered by substituting the optimal $\mu^*$ into the stationarity condition.

5. Recipe for Solving Inequality-Constrained Problems

Define the primal Lagrangian $\mathcal{L}(\mathbf{x}, \boldsymbol{\mu})$ with multipliers $\mu_i \geq 0$.
Set $\partial \mathcal{L}/\partial \mathbf{x} = 0$ to find the stationarity condition.
Use the stationarity condition to eliminate $\mathbf{x}$ from $\mathcal{L}$, yielding the dual $\ell(\boldsymbol{\mu})$.
Maximize $\ell(\boldsymbol{\mu})$ subject to $\mu_i \geq 0$ (and any additional constraints derived from the KKT conditions).
Recover the primal solution $\mathbf{x}^*$ from the optimal $\boldsymbol{\mu}^*$ via the stationarity condition.

Advantage Over the Primal

The dual $\ell(\boldsymbol{\mu})$ is always convex in $\boldsymbol{\mu}$ (even if the primal $f$ is not), and depends only on the Lagrange multipliers rather than both $\mathbf{x}$ and $\boldsymbol{\mu}$. This makes the dual often easier to solve numerically. For SVMs, it naturally leads to a kernel formulation and reveals sparsity.