E2 236 Foundations of ML¶

Lab 4a: Kernel Methods¶

Part A — Dual Representation, RBF & Gaussian Kernels¶

Topics covered:

  • Dual representation of linear models
  • Radial Basis Function (RBF) / Gaussian kernel
  • Kernel ridge regression on synthetic data
  • Kernel SVM classification on synthetic data
  • Visualising decision boundaries and regression fits

Instructions: Fill in all sections marked # YOUR CODE HERE. Do not modify the test assertions.


0. Setup & Imports¶

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error


np.random.seed(42)
%matplotlib inline
plt.rcParams.update({'figure.dpi': 110, 'font.size': 11})

1. Kernel Functions¶

The Gaussian / RBF kernel between two vectors $\mathbf{x}$ and $\mathbf{x}'$ is defined as

$$k(\mathbf{x}, \mathbf{x}') = \exp\!\left(-\frac{\|\mathbf{x} - \mathbf{x}'\|^2}{2\ell^2}\right)$$

where $\ell > 0$ is the length-scale (bandwidth) hyperparameter.

The polynomial kernel of degree $d$ is

$$k(\mathbf{x}, \mathbf{x}') = (\mathbf{x}^\top \mathbf{x}' + c)^d$$

In [2]:
def rbf_kernel(X1, X2, length_scale=1.0):
    """
    Compute the RBF (Gaussian) kernel matrix between rows of X1 and X2.

    Parameters
    ----------
    X1 : ndarray of shape (n, d)
    X2 : ndarray of shape (m, d)
    length_scale : float

    Returns
    -------
    K : ndarray of shape (n, m)
    """
    # YOUR CODE HERE
    raise NotImplementedError


def polynomial_kernel(X1, X2, degree=3, c=1.0):
    """
    Compute the polynomial kernel matrix.

    Parameters
    ----------
    X1 : ndarray of shape (n, d)
    X2 : ndarray of shape (m, d)
    degree : int
    c : float — free parameter

    Returns
    -------
    K : ndarray of shape (n, m)
    """
    # YOUR CODE HERE
    raise NotImplementedError
In [6]:
# --- Quick sanity checks (do not modify) ---
X_test = np.array([[1.0, 0.0], [0.0, 1.0]])
K_rbf = rbf_kernel(X_test, X_test, length_scale=1.0)
assert K_rbf.shape == (2, 2), "Shape error"
assert np.allclose(np.diag(K_rbf), 1.0), "Diagonal should be 1 for RBF"
K_poly = polynomial_kernel(X_test, X_test, degree=2, c=1.0)
assert K_poly.shape == (2, 2)
print("Kernel sanity checks passed ✓")
Kernel sanity checks passed ✓

2. Visualising the Kernel Matrix¶

Plot the RBF kernel matrix for a 1-D grid of 100 points in $[-3, 3]$ for three different length-scales: $\ell \in \{0.3, 1.0, 3.0\}$.

Hint: Use plt.imshow or plt.pcolormesh.

In [ ]:
# YOUR CODE HERE
# For each length_scale, compute the kernel matrix on a 1-D grid and plot it.
# Use a 1×3 subplot layout.
In [ ]:
 
No description has been provided for this image

3. Kernel Ridge Regression — Dual Representation¶

Given training data $(\mathbf{X}, \mathbf{y})$, kernel ridge regression finds the dual coefficients

$$\boldsymbol{\alpha} = (\mathbf{K} + \lambda \mathbf{I})^{-1} \mathbf{y}$$

and predictions are made via

$$\hat{y}(\mathbf{x}_*) = \sum_{i=1}^{n} \alpha_i\, k(\mathbf{x}_i, \mathbf{x}_*)$$

Q3.1 — Generate 80 noisy samples of $y = \sin(2x) + \varepsilon$, $\varepsilon \sim \mathcal{N}(0, 0.1^2)$, $x \in [-\pi, \pi]$.
Q3.2 — Implement fit and predict for kernel ridge regression.
Q3.3 — Fit models with $\ell \in \{0.3, 1.0, 3.0\}$ and plot the results.

In [ ]:
# Q3.1 — Generate data
# YOUR CODE HERE
# X_train shape: (80, 1), y_train shape: (80,)
In [ ]:
class KernelRidgeRegression:
    """Kernel Ridge Regression via the dual representation."""

    def __init__(self, kernel_fn, lam=1e-3):
        """
        Parameters
        ----------
        kernel_fn : callable  k(X1, X2) -> matrix
        lam       : float     regularisation strength λ
        """
        self.kernel_fn = kernel_fn
        self.lam = lam
        self.alpha_ = None
        self.X_train_ = None

    def fit(self, X, y):
        """
        Solve (K + λI)α = y and store α and X.

        Parameters
        ----------
        X : ndarray (n, d)
        y : ndarray (n,)
        """
        # YOUR CODE HERE
        raise NotImplementedError

    def predict(self, X_star):
        """
        Predict at new inputs X_star.

        Parameters
        ----------
        X_star : ndarray (m, d)

        Returns
        -------
        y_pred : ndarray (m,)
        """
        # YOUR CODE HERE
        raise NotImplementedError
In [ ]:
# Q3.3 — Fit and plot for three length-scales
# YOUR CODE HERE
In [ ]:
 
No description has been provided for this image

4. Effect of Length-Scale and Regularisation¶

Q4.1 — Using your KernelRidgeRegression class, perform a grid search over $\ell \in \{0.1, 0.3, 1.0, 3.0\}$ and $\lambda \in \{10^{-4}, 10^{-2}, 1\}$. Report the test MSE for each combination (use an 80/20 train/test split).

Q4.2 — Plot a heatmap of the test MSE grid.

Q4.3 (written) — In 2–3 sentences, explain the effect of increasing $\ell$ and increasing $\lambda$ on the model's bias and variance.

In [ ]:
# Q4.1 & Q4.2 — Grid search + heatmap
# YOUR CODE HERE
In [ ]:

No description has been provided for this image

Q4.3 — Written answer:

[Your answer here]

5. Kernel SVM Classification¶

We now apply the kernel trick to classification using a Support Vector Machine with an RBF kernel.

Q5.1 — Generate two 2-D datasets:

  • make_moons(n_samples=300, noise=0.2)
  • make_circles(n_samples=300, noise=0.15, factor=0.5)

Q5.2 — Fit sklearn.svm.SVC with kernel='rbf' on each dataset.
Q5.3 — Plot the decision boundary and support vectors for each dataset.
Q5.4 (written) — What happens to the decision boundary when you set C very large or very small?

In [ ]:
from sklearn.svm import SVC

def plot_decision_boundary(clf, X, y, ax, title=''):
    """Plot a 2-class decision boundary on ax."""
    # YOUR CODE HERE — build a meshgrid, predict, and plot contourf + scatter
    pass

# Q5.1 Generate data
# YOUR CODE HERE

# Q5.2 Fit SVMs
# YOUR CODE HERE

# Q5.3 Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# YOUR CODE HERE
plt.tight_layout()
plt.show()
In [ ]:
 
No description has been provided for this image

Q5.4 — Written answer:

[Your answer here]

6. Comparing Kernels on a Nonlinear Dataset¶

Q6.1 — Use make_moons(n_samples=500, noise=0.3) with an 80/20 split.
Q6.2 — Train an SVC with each of the following kernels: 'linear', 'poly' (degree 3), 'rbf'.
Q6.3 — Report accuracy and plot all three decision boundaries side by side.
Q6.4 (written) — Rank the kernels by test accuracy and explain why the best kernel performs well on this dataset.

In [ ]:
# Q6.1–6.3 — Compare kernels
# YOUR CODE HERE
In [ ]:
 
No description has been provided for this image

Q6.4 — Written answer:

[Your answer here]

7. The Kernel Matrix is Positive Semi-Definite (Conceptual)¶

Q7 (written) — Prove (or give an intuitive argument) that the Gram matrix $K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$ constructed with the RBF kernel is always positive semi-definite.
Hint: Consider the feature map $\phi$ such that $k(\mathbf{x}, \mathbf{x}') = \phi(\mathbf{x})^\top \phi(\mathbf{x}')$.

Q7 — Written answer:

[Your answer here]


End of Part A¶

Proceed to Part B: Gaussian Processes.