E2 236 Foundations of ML¶
Lab 4a: Kernel Methods¶
Part A — Dual Representation, RBF & Gaussian Kernels¶
Topics covered:
- Dual representation of linear models
- Radial Basis Function (RBF) / Gaussian kernel
- Kernel ridge regression on synthetic data
- Kernel SVM classification on synthetic data
- Visualising decision boundaries and regression fits
Instructions: Fill in all sections marked # YOUR CODE HERE. Do not modify the test assertions.
0. Setup & Imports¶
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
np.random.seed(42)
%matplotlib inline
plt.rcParams.update({'figure.dpi': 110, 'font.size': 11})
1. Kernel Functions¶
The Gaussian / RBF kernel between two vectors $\mathbf{x}$ and $\mathbf{x}'$ is defined as
$$k(\mathbf{x}, \mathbf{x}') = \exp\!\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right)$$
where $\ell > 0$ is the length-scale (bandwidth) hyperparameter.
The polynomial kernel of degree $d$ is
$$k(\mathbf{x}, \mathbf{x}') = (\mathbf{x}^\top \mathbf{x}' + c)^d$$
def rbf_kernel(X1, X2, length_scale=1.0):
"""
Compute the RBF (Gaussian) kernel matrix between rows of X1 and X2.
Parameters
----------
X1 : ndarray of shape (n, d)
X2 : ndarray of shape (m, d)
length_scale : float
Returns
-------
K : ndarray of shape (n, m)
"""
# YOUR CODE HERE
raise NotImplementedError
def polynomial_kernel(X1, X2, degree=3, c=1.0):
"""
Compute the polynomial kernel matrix.
Parameters
----------
X1 : ndarray of shape (n, d)
X2 : ndarray of shape (m, d)
degree : int
c : float — free parameter
Returns
-------
K : ndarray of shape (n, m)
"""
# YOUR CODE HERE
raise NotImplementedError
# --- Quick sanity checks (do not modify) ---
X_test = np.array([[1.0, 0.0], [0.0, 1.0]])
K_rbf = rbf_kernel(X_test, X_test, length_scale=1.0)
assert K_rbf.shape == (2, 2), "Shape error"
assert np.allclose(np.diag(K_rbf), 1.0), "Diagonal should be 1 for RBF"
K_poly = polynomial_kernel(X_test, X_test, degree=2, c=1.0)
assert K_poly.shape == (2, 2)
print("Kernel sanity checks passed ✓")
Kernel sanity checks passed ✓
2. Visualising the Kernel Matrix¶
Plot the RBF kernel matrix for a 1-D grid of 100 points in $[-3, 3]$ for three different length-scales: $\ell \in \{0.3, 1.0, 3.0\}$.
Hint: Use plt.imshow or plt.pcolormesh.
# YOUR CODE HERE
# For each length_scale, compute the kernel matrix on a 1-D grid and plot it.
# Use a 1×3 subplot layout.
3. Kernel Ridge Regression — Dual Representation¶
Given training data $(\mathbf{X}, \mathbf{y})$, kernel ridge regression finds the dual coefficients
$$\boldsymbol{\alpha} = (\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{y}$$
and predictions are made via
$$\hat{y}(\mathbf{x}_*) = \sum_{i=1}^{n} \alpha_i\, k(\mathbf{x}_i, \mathbf{x}_*)$$
Q3.1 — Generate 80 noisy samples of $y = \sin(2x) + \varepsilon$, $\varepsilon \sim \mathcal{N}(0, 0.1^2)$, $x \in [-\pi, \pi]$.
Q3.2 — Implement fit and predict for kernel ridge regression.
Q3.3 — Fit models with $\ell \in \{0.3, 1.0, 3.0\}$ and plot the results.
# Q3.1 — Generate data
# YOUR CODE HERE
# X_train shape: (80, 1), y_train shape: (80,)
class KernelRidgeRegression:
"""Kernel Ridge Regression via the dual representation."""
def __init__(self, kernel_fn, lam=1e-3):
"""
Parameters
----------
kernel_fn : callable k(X1, X2) -> matrix
lam : float regularisation strength λ
"""
self.kernel_fn = kernel_fn
self.lam = lam
self.alpha_ = None
self.X_train_ = None
def fit(self, X, y):
"""
Solve (K + λI)α = y and store α and X.
Parameters
----------
X : ndarray (n, d)
y : ndarray (n,)
"""
# YOUR CODE HERE
raise NotImplementedError
def predict(self, X_star):
"""
Predict at new inputs X_star.
Parameters
----------
X_star : ndarray (m, d)
Returns
-------
y_pred : ndarray (m,)
"""
# YOUR CODE HERE
raise NotImplementedError
# Q3.3 — Fit and plot for three length-scales
# YOUR CODE HERE
4. Effect of Length-Scale and Regularisation¶
Q4.1 — Using your KernelRidgeRegression class, perform a grid search over
$\ell \in \{0.1, 0.3, 1.0, 3.0\}$ and $\lambda \in \{10^{-4}, 10^{-2}, 1\}$.
Report the test MSE for each combination (use an 80/20 train/test split).
Q4.2 — Plot a heatmap of the test MSE grid.
Q4.3 (written) — In 2–3 sentences, explain the effect of increasing $\ell$ and increasing $\lambda$ on the model's bias and variance.
# Q4.1 & Q4.2 — Grid search + heatmap
# YOUR CODE HERE
Q4.3 — Written answer:
[Your answer here]
5. Kernel SVM Classification¶
We now apply the kernel trick to classification using a Support Vector Machine with an RBF kernel.
Q5.1 — Generate two 2-D datasets:
make_moons(n_samples=300, noise=0.2)make_circles(n_samples=300, noise=0.15, factor=0.5)
Q5.2 — Fit sklearn.svm.SVC with kernel='rbf' on each dataset.
Q5.3 — Plot the decision boundary and support vectors for each dataset.
Q5.4 (written) — What happens to the decision boundary when you set C very large or very small?
from sklearn.svm import SVC
def plot_decision_boundary(clf, X, y, ax, title=''):
"""Plot a 2-class decision boundary on ax."""
# YOUR CODE HERE — build a meshgrid, predict, and plot contourf + scatter
pass
# Q5.1 Generate data
# YOUR CODE HERE
# Q5.2 Fit SVMs
# YOUR CODE HERE
# Q5.3 Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# YOUR CODE HERE
plt.tight_layout()
plt.show()
Q5.4 — Written answer:
[Your answer here]
6. Comparing Kernels on a Nonlinear Dataset¶
Q6.1 — Use make_moons(n_samples=500, noise=0.3) with an 80/20 split.
Q6.2 — Train an SVC with each of the following kernels: 'linear', 'poly' (degree 3), 'rbf'.
Q6.3 — Report accuracy and plot all three decision boundaries side by side.
Q6.4 (written) — Rank the kernels by test accuracy and explain why the best kernel performs well on this dataset.
# Q6.1–6.3 — Compare kernels
# YOUR CODE HERE
Q6.4 — Written answer:
[Your answer here]
7. The Kernel Matrix is Positive Semi-Definite (Conceptual)¶
Q7 (written) — Prove (or give an intuitive argument) that the Gram matrix $K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$ constructed with the RBF kernel is always positive semi-definite.
Hint: Consider the feature map $\phi$ such that $k(\mathbf{x}, \mathbf{x}') = \phi(\mathbf{x})^\top \phi(\mathbf{x}')$.
Q7 — Written answer:
[Your answer here]