E2 236 Foundations of ML¶

Lab 3 Optimization¶

Part A — Robust Regression with Huber Loss via Convex Optimization¶

Learning objectives

  • Derive and implement the Huber loss and its subgradient
  • Solve robust regression using:
    • Subgradient descent (from scratch)
    • CVX / cvxpy (convex solver)
    • Iteratively Reweighted Least Squares (IRLS)
  • Diagnose convexity and compare methods on synthetic data

Instructions: Complete every cell marked # YOUR CODE HERE. Do not change test assertions.


Imports and Setup¶

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

np.random.seed(42)
%matplotlib inline
plt.rcParams.update({'figure.dpi':110, 'font.size':11})

# Optional: install cvxpy if not present
# !pip install cvxpy --quiet
try:
    import cvxpy as cp
    CVXPY = True
    print("cvxpy available ✓")
except ImportError:
    CVXPY = False
    print("cvxpy not found — CVX section will be skipped")

1. Data Generation¶

We generate a synthetic regression dataset with deliberate outliers to motivate robust regression.

Q1 — Complete the function below:

  • Draw $n$ inlier points: $y_i = \mathbf{x}_i^\top \boldsymbol{\beta}^* + \varepsilon_i$, where $\varepsilon_i \sim \mathcal{N}(0, \sigma^2)$ and $\mathbf{x}_i \sim \mathcal{N}(\mathbf{0}, I)$.
  • Replace n_outliers randomly selected $y$-values with large noise drawn from $\mathcal{N}(0, (10\sigma)^2)$.
In [ ]:
def generate_data(n=150, d=5, sigma=0.5, n_outliers=20, seed=0):
    """
    Generate a regression dataset with Gaussian outliers.

    Returns
    -------
    X        : ndarray (n, d)   design matrix (NO bias column)
    y        : ndarray (n,)     noisy targets
    beta_star: ndarray (d,)     true coefficient vector
    outlier_idx: ndarray (int)  indices of corrupted observations
    """
    # YOUR CODE HERE
    raise NotImplementedError


# Generate and inspect
X, y, beta_star, outlier_idx = generate_data()
print(f"X shape: {X.shape}, y shape: {y.shape}")
print(f"True beta: {beta_star}")
print(f"Number of outliers: {len(outlier_idx)}")

2. Loss Functions¶

Huber loss¶

$$\mathcal{L}_{\text{Huber}}(\boldsymbol{\beta}) = \frac{1}{n}\sum_{i=1}^n h_\delta(y_i - \mathbf{x}_i^\top \boldsymbol{\beta})$$

where the per-sample Huber function is

$$h_\delta(r) = \begin{cases} \tfrac{1}{2}r^2 & |r| \le \delta \\ \delta\bigl(|r| - \tfrac{1}{2}\delta\bigr) & |r| > \delta \end{cases}$$

Q2.1 — Implement the loss function.
Q2.2 — Show that $h_\delta$ is convex and that its derivative/subgradient w.r.t. $r$ is

$$h_\delta'(r) = \begin{cases} r & |r| \le \delta \\ \delta \operatorname{sign}(r) & |r| > \delta \end{cases}$$

(Written answer — no code required for Q2.2.)

In [ ]:
def huber_loss(beta, X, y, delta=1.0):
    """
    Mean Huber loss.

    Parameters
    ----------
    beta  : ndarray (d,)
    X     : ndarray (n, d)
    y     : ndarray (n,)
    delta : float  — transition point

    Returns
    -------
    loss : float
    """
    # YOUR CODE HERE
    raise NotImplementedError


def huber_gradient(beta, X, y, delta=1.0):
    """
    Gradient of the mean Huber loss w.r.t. beta.

    Returns
    -------
    grad : ndarray (d,)
    """
    # YOUR CODE HERE
    raise NotImplementedError


# --- Sanity checks (do not modify) ---
beta0 = np.zeros(X.shape[1])
assert np.isscalar(ols_loss(beta0, X, y)),    "OLS loss must return a scalar"
assert np.isscalar(huber_loss(beta0, X, y)),  "Huber loss must return a scalar"
assert huber_gradient(beta0, X, y).shape == beta0.shape, "Gradient shape mismatch"
print("Loss function checks passed ✓")
In [ ]:
# Q2.3 — Plot h_delta(r) for delta in {0.5, 1.0, 2.0} and compare with |r|
r = np.linspace(-4, 4, 400)

# YOUR CODE HERE
# Plot h_delta(r) and |r| on the same axes
In [4]:
 
No description has been provided for this image

Q2.2 — Convexity argument (written):

[Your answer here — show h_δ is convex by checking the second derivative or epigraph.]

3. Subgradient Descent¶

Subgradient descent updates: $$\boldsymbol{\beta}^{(t+1)} = \boldsymbol{\beta}^{(t)} - \eta_t \, \nabla_\beta \mathcal{L}_{\text{Huber}}(\boldsymbol{\beta}^{(t)})$$

We use a diminishing step-size schedule: $\eta_t = \eta_0 / \sqrt{t+1}$.

Q3 — Implement subgradient descent for Huber regression. Return the parameter history and loss history so we can analyse convergence.

In [ ]:
def subgradient_descent_huber(X, y, delta=1.0, eta0=0.1, n_iter=500):
    """
    Subgradient descent for Huber regression.

    Parameters
    ----------
    X      : ndarray (n, d)
    y      : ndarray (n,)
    delta  : float    Huber threshold
    eta0   : float    initial step-size
    n_iter : int      number of iterations

    Returns
    -------
    beta_hist : ndarray (n_iter+1, d)  parameter history (including init)
    loss_hist : ndarray (n_iter,)      Huber loss at each step
    """
    # YOUR CODE HERE
    raise NotImplementedError


beta_hist_sgd, loss_hist_sgd = subgradient_descent_huber(X, y)
print(f"Final Huber loss (subgradient): {loss_hist_sgd[-1]:.4f}")
In [ ]:
# Q3.1 — Plot the convergence curve (loss vs iteration)
# YOUR CODE HERE
In [6]:
 
No description has been provided for this image

4. Solving with CVXPY (Convex Solver)¶

Q4 — Formulate Huber regression as a CVXPY problem and solve it.

Hint: cp.huber(expr, M=delta) returns the element-wise Huber function.

In [ ]:
def solve_huber_cvxpy(X, y, delta=1.0):
    """
    Solve Huber regression using CVXPY.

    Returns
    -------
    beta_cvx : ndarray (d,)  or None if cvxpy unavailable
    """
    if not CVXPY:
        print("cvxpy not available — skipping")
        return None
    # YOUR CODE HERE
    # Hint: call prob.solve() with NO solver argument —
    # CVXPY will automatically select the best installed solver.
    # After solving, check prob.status before returning beta.value.
    raise NotImplementedError


beta_cvx = solve_huber_cvxpy(X, y)
if beta_cvx is not None:
    print(f"CVXPY solution:      {beta_cvx}")
    print(f"True beta:           {beta_star}")
In [7]:
 
CVXPY solution: [ 0.462  0.543  1.338 -1.775  1.732]
True beta:      [ 0.468  0.527  1.375 -1.815  1.739]

5. IRLS for Huber Regression¶

Iteratively Reweighted Least Squares (IRLS) exploits the fact that the gradient of the Huber loss can be written as

$$\nabla \mathcal{L} = -\frac{1}{n} X^\top W(\boldsymbol{\beta})(\mathbf{y} - X\boldsymbol{\beta})$$

where $W = \operatorname{diag}(w_i)$ with

$$w_i = \begin{cases} 1 & |r_i| \le \delta \\ \delta / |r_i| & |r_i| > \delta \end{cases}$$

At each iteration, solve the weighted least squares problem:

$$\boldsymbol{\beta}^{(t+1)} = (X^\top W^{(t)} X)^{-1} X^\top W^{(t)} \mathbf{y}$$

Q5 — Implement IRLS.

In [ ]:
def irls_huber(X, y, delta=1.0, n_iter=50, tol=1e-6):
    """
    IRLS for Huber regression.

    Returns
    -------
    beta      : ndarray (d,)
    loss_hist : list of floats
    """
    # YOUR CODE HERE
    raise NotImplementedError


beta_irls, loss_hist_irls = irls_huber(X, y)
print(f"IRLS solution:   {beta_irls}")
print(f"True beta:       {beta_star}")
print(f"Final Huber loss (IRLS): {loss_hist_irls[-1]:.4f}")
In [8]:
 
IRLS solution:   [ 0.462  0.543  1.338 -1.775  1.732]
True beta:       [ 0.468  0.527  1.375 -1.815  1.739]
Final Huber loss (IRLS): 0.5309

6. Comparison¶

Q6.1 — Fill in the comparison table below.
Q6.2 — On a single figure, plot:

  1. The OLS fit (closed-form)
  2. Huber fit (IRLS or subgradient)
  3. Data points, highlighting outliers

Use the first feature of $X$ for a 2-D scatter plot (fix remaining features at 0).

Q6.3 (written) — When would you prefer IRLS over the subgradient method, and vice versa?

In [ ]:
# OLS closed-form
beta_ols = np.linalg.lstsq(X, y, rcond=None)[0]

# Q6.1 — Compute errors
for name, beta_hat in [('OLS', beta_ols), ('Huber-SGD', beta_hist_sgd[-1]),
                        ('Huber-IRLS', beta_irls)]:
    mse  = np.mean((beta_hat - beta_star)**2)
    loss = ols_loss(beta_hat, X, y)
    print(f"{name:12s} | beta MSE: {mse:.4f} | OLS loss: {loss:.4f}")

# Q6.2 — 2-D plot
# YOUR CODE HERE
In [9]:

Method       | beta MSE
----------------------------------------
Huber-SGD    | 0.0197  
Huber-IRLS   | 0.0007  
No description has been provided for this image

Q6.3 — Written answer:

[Your answer here]

7. Effect of Delta¶

Q7 — Train IRLS for $\delta \in \{0.1, 0.5, 1.0, 2.0, 5.0\}$ and plot the final $\|\hat{\boldsymbol{\beta}} - \boldsymbol{\beta}^*\|_2$ vs $\delta$.

Q7 (written) — Explain what happens as $\delta \to 0$ and $\delta \to \infty$.

In [ ]:
# YOUR CODE HERE
In [ ]:
 
No description has been provided for this image

Q7 — Written answer:

[Your answer here]