Lecture 11.4
Intermezzo: The Dual Lagrangian
Solving the SVM requires optimization under inequality constraints. The key tool is the Lagrangian dual: converting the primal constrained problem into an equivalent unconstrained dual problem whose solution also solves the original.
- Distinguish equality-constrained from inequality-constrained optimization.
- State the Karush–Kuhn–Tucker (KKT) conditions and interpret complementary slackness.
- Define the primal and dual Lagrangians for an inequality-constrained problem.
- Explain weak versus strong duality, and state when solving the dual solves the primal.
1. Inequality-Constrained Optimization
We want to maximize $f(\mathbf{x})$ subject to $g(\mathbf{x}) \geq 0$. Two cases arise:
- Interior optimum: the unconstrained maximum satisfies $g(\mathbf{x}^*) > 0$. The constraint is inactive and the solution is simply the unconstrained optimum.
- Boundary optimum: the unconstrained maximum violates $g(\mathbf{x}) \geq 0$. The constrained solution lies on the boundary $g(\mathbf{x}^*) = 0$, and both $\nabla f$ and $\nabla g$ must be anti-parallel (neither can be improved along the boundary).
2. KKT Conditions
For the problem $\max_\mathbf{x} f(\mathbf{x})$ subject to $g(\mathbf{x}) \geq 0$, introduce Lagrange multiplier $\mu \geq 0$. At the optimum:
- Stationarity: $\nabla_\mathbf{x} \bigl[f(\mathbf{x}) + \mu\, g(\mathbf{x})\bigr] = \mathbf{0}$.
- Primal feasibility: $g(\mathbf{x}) \geq 0$.
- Dual feasibility: $\mu \geq 0$.
- Complementary slackness: $\mu\, g(\mathbf{x}) = 0$ — at least one of $\mu$ or $g(\mathbf{x})$ must be zero.
Complementary slackness captures the two cases: if $g(\mathbf{x}) > 0$ (interior), then $\mu = 0$ (constraint is inactive); if $\mu > 0$ (boundary optimum), then $g(\mathbf{x}) = 0$.
3. Primal and Dual Lagrangians
The primal Lagrangian (for a minimization problem $\min_\mathbf{x} f(\mathbf{x})$ s.t. $g(\mathbf{x}) \geq 0$) is
$$\mathcal{L}(\mathbf{x}, \mu) = f(\mathbf{x}) - \mu\, g(\mathbf{x}), \quad \mu \geq 0.$$The dual Lagrangian eliminates $\mathbf{x}$ by taking the minimum over the primal variable:
$$\ell(\mu) = \min_\mathbf{x}\, \mathcal{L}(\mathbf{x}, \mu).$$This is done by (1) finding the stationary point $\partial \mathcal{L}/\partial \mathbf{x} = 0$, (2) expressing $\mathbf{x}$ in terms of $\mu$, and (3) substituting back. The result $\ell(\mu)$ depends only on $\mu$.
The dual problem is then $\max_{\mu \geq 0}\, \ell(\mu)$.
4. Weak vs. Strong Duality
The dual Lagrangian always provides an upper bound on the primal objective: $\ell(\mu) \geq f(\mathbf{x}^*)$ for all feasible $\mathbf{x}^*$ (weak duality). Minimizing $\ell(\mu)$ gives the tightest such upper bound.
For convex optimization problems (such as the SVM), the duality gap is zero: the minimizer of the dual equals the optimal primal objective. Solving the dual therefore solves the primal. The primal optimum $\mathbf{x}^*$ is recovered by substituting the optimal $\mu^*$ into the stationarity condition.
5. Recipe for Solving Inequality-Constrained Problems
- Define the primal Lagrangian $\mathcal{L}(\mathbf{x}, \boldsymbol{\mu})$ with multipliers $\mu_i \geq 0$.
- Set $\partial \mathcal{L}/\partial \mathbf{x} = 0$ to find the stationarity condition.
- Use the stationarity condition to eliminate $\mathbf{x}$ from $\mathcal{L}$, yielding the dual $\ell(\boldsymbol{\mu})$.
- Maximize $\ell(\boldsymbol{\mu})$ subject to $\mu_i \geq 0$ (and any additional constraints derived from the KKT conditions).
- Recover the primal solution $\mathbf{x}^*$ from the optimal $\boldsymbol{\mu}^*$ via the stationarity condition.
The dual $\ell(\boldsymbol{\mu})$ is always convex in $\boldsymbol{\mu}$ (even if the primal $f$ is not), and depends only on the Lagrange multipliers rather than both $\mathbf{x}$ and $\boldsymbol{\mu}$. This makes the dual often easier to solve numerically. For SVMs, it naturally leads to a kernel formulation and reveals sparsity.