E2 236 Foundations of ML¶
Lab 3 Optimization¶
Part A — Robust Regression with Huber Loss via Convex Optimization¶
Learning objectives
- Derive and implement the Huber loss and its subgradient
- Solve robust regression using:
- Subgradient descent (from scratch)
- CVX /
cvxpy(convex solver) - Iteratively Reweighted Least Squares (IRLS)
- Diagnose convexity and compare methods on synthetic data
Instructions: Complete every cell marked # YOUR CODE HERE. Do not change test assertions.
Imports and Setup¶
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
np.random.seed(42)
%matplotlib inline
plt.rcParams.update({'figure.dpi':110, 'font.size':11})
# Optional: install cvxpy if not present
# !pip install cvxpy --quiet
try:
import cvxpy as cp
CVXPY = True
print("cvxpy available ✓")
except ImportError:
CVXPY = False
print("cvxpy not found — CVX section will be skipped")
1. Data Generation¶
We generate a synthetic regression dataset with deliberate outliers to motivate robust regression.
Q1 — Complete the function below:
- Draw $n$ inlier points: $y_i = \mathbf{x}_i^\top \boldsymbol{\beta}^* + \varepsilon_i$, where $\varepsilon_i \sim \mathcal{N}(0, \sigma^2)$ and $\mathbf{x}_i \sim \mathcal{N}(\mathbf{0}, I)$.
- Replace
n_outliersrandomly selected $y$-values with large noise drawn from $\mathcal{N}(0, (10\sigma)^2)$.
def generate_data(n=150, d=5, sigma=0.5, n_outliers=20, seed=0):
"""
Generate a regression dataset with Gaussian outliers.
Returns
-------
X : ndarray (n, d) design matrix (NO bias column)
y : ndarray (n,) noisy targets
beta_star: ndarray (d,) true coefficient vector
outlier_idx: ndarray (int) indices of corrupted observations
"""
# YOUR CODE HERE
raise NotImplementedError
# Generate and inspect
X, y, beta_star, outlier_idx = generate_data()
print(f"X shape: {X.shape}, y shape: {y.shape}")
print(f"True beta: {beta_star}")
print(f"Number of outliers: {len(outlier_idx)}")
2. Loss Functions¶
Huber loss¶
$$\mathcal{L}_{\text{Huber}}(\boldsymbol{\beta}) = \frac{1}{n}\sum_{i=1}^n h_\delta(y_i - \mathbf{x}_i^\top \boldsymbol{\beta})$$
where the per-sample Huber function is
$$h_\delta(r) = \begin{cases} \tfrac{1}{2}r^2 & |r| \le \delta \\ \delta\bigl(|r| - \tfrac{1}{2}\delta\bigr) & |r| > \delta \end{cases}$$
Q2.1 — Implement the loss function.
Q2.2 — Show that $h_\delta$ is convex and that its derivative/subgradient w.r.t. $r$ is
$$h_\delta'(r) = \begin{cases} r & |r| \le \delta \\ \delta \operatorname{sign}(r) & |r| > \delta \end{cases}$$
(Written answer — no code required for Q2.2.)
def huber_loss(beta, X, y, delta=1.0):
"""
Mean Huber loss.
Parameters
----------
beta : ndarray (d,)
X : ndarray (n, d)
y : ndarray (n,)
delta : float — transition point
Returns
-------
loss : float
"""
# YOUR CODE HERE
raise NotImplementedError
def huber_gradient(beta, X, y, delta=1.0):
"""
Gradient of the mean Huber loss w.r.t. beta.
Returns
-------
grad : ndarray (d,)
"""
# YOUR CODE HERE
raise NotImplementedError
# --- Sanity checks (do not modify) ---
beta0 = np.zeros(X.shape[1])
assert np.isscalar(ols_loss(beta0, X, y)), "OLS loss must return a scalar"
assert np.isscalar(huber_loss(beta0, X, y)), "Huber loss must return a scalar"
assert huber_gradient(beta0, X, y).shape == beta0.shape, "Gradient shape mismatch"
print("Loss function checks passed ✓")
# Q2.3 — Plot h_delta(r) for delta in {0.5, 1.0, 2.0} and compare with |r|
r = np.linspace(-4, 4, 400)
# YOUR CODE HERE
# Plot h_delta(r) and |r| on the same axes
Q2.2 — Convexity argument (written):
[Your answer here — show h_δ is convex by checking the second derivative or epigraph.]
3. Subgradient Descent¶
Subgradient descent updates: $$\boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} - \eta_t \, \nabla_\beta \mathcal{L}_{\text{Huber}}(\boldsymbol{\beta}^{(t)})$$
We use a diminishing step-size schedule: $\eta_t = \eta_0 / \sqrt{t+1}$.
Q3 — Implement subgradient descent for Huber regression. Return the parameter history and loss history so we can analyse convergence.
def subgradient_descent_huber(X, y, delta=1.0, eta0=0.1, n_iter=500):
"""
Subgradient descent for Huber regression.
Parameters
----------
X : ndarray (n, d)
y : ndarray (n,)
delta : float Huber threshold
eta0 : float initial step-size
n_iter : int number of iterations
Returns
-------
beta_hist : ndarray (n_iter+1, d) parameter history (including init)
loss_hist : ndarray (n_iter,) Huber loss at each step
"""
# YOUR CODE HERE
raise NotImplementedError
beta_hist_sgd, loss_hist_sgd = subgradient_descent_huber(X, y)
print(f"Final Huber loss (subgradient): {loss_hist_sgd[-1]:.4f}")
# Q3.1 — Plot the convergence curve (loss vs iteration)
# YOUR CODE HERE
4. Solving with CVXPY (Convex Solver)¶
Q4 — Formulate Huber regression as a CVXPY problem and solve it.
Hint: cp.huber(expr, M=delta) returns the element-wise Huber function.
def solve_huber_cvxpy(X, y, delta=1.0):
"""
Solve Huber regression using CVXPY.
Returns
-------
beta_cvx : ndarray (d,) or None if cvxpy unavailable
"""
if not CVXPY:
print("cvxpy not available — skipping")
return None
# YOUR CODE HERE
# Hint: call prob.solve() with NO solver argument —
# CVXPY will automatically select the best installed solver.
# After solving, check prob.status before returning beta.value.
raise NotImplementedError
beta_cvx = solve_huber_cvxpy(X, y)
if beta_cvx is not None:
print(f"CVXPY solution: {beta_cvx}")
print(f"True beta: {beta_star}")
CVXPY solution: [ 0.462 0.543 1.338 -1.775 1.732] True beta: [ 0.468 0.527 1.375 -1.815 1.739]
5. IRLS for Huber Regression¶
Iteratively Reweighted Least Squares (IRLS) exploits the fact that the gradient of the Huber loss can be written as
$$\nabla \mathcal{L} = -\frac{1}{n} X^\top W(\boldsymbol{\beta})(\mathbf{y} - X\boldsymbol{\beta})$$
where $W = \operatorname{diag}(w_i)$ with
$$w_i = \begin{cases} 1 & |r_i| \le \delta \\ \delta / |r_i| & |r_i| > \delta \end{cases}$$
At each iteration, solve the weighted least squares problem:
$$\boldsymbol{\beta}^{(t+1)} = (X^\top W^{(t)} X)^{-1} X^\top W^{(t)} \mathbf{y}$$
Q5 — Implement IRLS.
def irls_huber(X, y, delta=1.0, n_iter=50, tol=1e-6):
"""
IRLS for Huber regression.
Returns
-------
beta : ndarray (d,)
loss_hist : list of floats
"""
# YOUR CODE HERE
raise NotImplementedError
beta_irls, loss_hist_irls = irls_huber(X, y)
print(f"IRLS solution: {beta_irls}")
print(f"True beta: {beta_star}")
print(f"Final Huber loss (IRLS): {loss_hist_irls[-1]:.4f}")
IRLS solution: [ 0.462 0.543 1.338 -1.775 1.732] True beta: [ 0.468 0.527 1.375 -1.815 1.739] Final Huber loss (IRLS): 0.5309
6. Comparison¶
Q6.1 — Fill in the comparison table below.
Q6.2 — On a single figure, plot:
- The OLS fit (closed-form)
- Huber fit (IRLS or subgradient)
- Data points, highlighting outliers
Use the first feature of $X$ for a 2-D scatter plot (fix remaining features at 0).
Q6.3 (written) — When would you prefer IRLS over the subgradient method, and vice versa?
# OLS closed-form
beta_ols = np.linalg.lstsq(X, y, rcond=None)[0]
# Q6.1 — Compute errors
for name, beta_hat in [('OLS', beta_ols), ('Huber-SGD', beta_hist_sgd[-1]),
('Huber-IRLS', beta_irls)]:
mse = np.mean((beta_hat - beta_star)**2)
loss = ols_loss(beta_hat, X, y)
print(f"{name:12s} | beta MSE: {mse:.4f} | OLS loss: {loss:.4f}")
# Q6.2 — 2-D plot
# YOUR CODE HERE
Method | beta MSE ---------------------------------------- Huber-SGD | 0.0197 Huber-IRLS | 0.0007
Q6.3 — Written answer:
[Your answer here]
7. Effect of Delta¶
Q7 — Train IRLS for $\delta \in \{0.1, 0.5, 1.0, 2.0, 5.0\}$ and plot the final $\|\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}^*\|_2$ vs $\delta$.
Q7 (written) — Explain what happens as $\delta \to 0$ and $\delta \to \infty$.
# YOUR CODE HERE
Q7 — Written answer:
[Your answer here]