Homework 4: Stochastic gradient methods
This homework as two parts. In the first part, there are a question related to theoretical aspects of stochastic gradient descent and averaging iterations. The second part has two questions related to the understanding of gradient descent, stochastic gradient descent and its mini-batch version, and implementing SGD and SVRG for softmax regression for classification of fashion MNIST dataset.
Submission instructions:
# Name:
# SR number:
# Program:
# Registration type (Credit/Audit):
[50 pts] Question 1. We wish to minimize the quadratic function
$$ f({\bf x}) = \frac{1}{2}{\bf x}^T{\bf P}{\bf x} - {\bf q}^T{\bf x} $$where ${\bf P} \in \mathbb{R}^{d \times d}$ and ${\bf q} \in \mathbb{R}^d$ with ${\bf x}^\star$ being the global minimum.
Suppose we have access to a noisy oracle for the gradient $$ \tilde{\bf g}_t = \nabla f({\bf x}_t) + {\bf e}_t $$ where we assume the noise ${\bf e}_t$ is is zero mean and uncorrelated with bounded variance $\mathbb{E}[{\bf e}_t{\bf e}_t^T] \leq \sigma^2{\bf I}$. Here $\sigma^2 \geq 0$.
Hint: You might need the inequality $\frac{(1 - (1-b)^t)^2}{bt} \leq 1$ for all $b \in [0,1]$.
Solution 1.
Enter your answer here
This part has two programming questions on understanding of stochastic gradient descent and its application to softmax regression
[20 pts] Question 2.
The aim of this question is to understand the gradient descent vs. stochastic gradient descent vs. mini-batch for a linear regression problem
$$ \underset{{\bf x}}{\text{minimize}} \quad f({\bf x}) = \frac{1}{m}\sum\limits_{i=1}^m ({\bf a}_i^T{\bf x} - {b}_i)^2. $$Consider a simple setting with ${\bf x} \in \mathbb{R}^2$ with $m=100$.
## Solution 2.
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
# data generation
A_ = 2 * np.random.rand(100, 1)
y = 4 + 3 * A_ + np.random.randn(100, 1)
A = np.c_[np.ones((100, 1)), A_] # add x0 = 1 to each instance
# plotting data
plt.plot(A_, y, "b.")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$x_2$", rotation=0, fontsize=18)
plt.axis([0, 2, 0, 15])
plt.show()
np.linalg.pinv(A).dot(y)
Display the iterates (e.g., see the picture below) of
# Enter your code here
def stepsize_schedule(eta):
return eta0 / (eta + eta1)
[30 pts] Question 3. Softmax regression.
Implement stochastic gradient descent, stochastic gradient descent with momentum, and stochastic variance reduced gradient (SVRG) for softmax regression (refer softmax regression problem from Homework 1).
We will work with the fashion MNIST dataset.
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
#Split the full training set into a validation set and a (smaller) training set.
#We also scale the pixel intensities down to the 0-1 range and convert them to floats, by dividing by 255.
X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.
#convert the vector of class indices into a matrix containing a one-hot vector for each instance
num_classes = 10
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
y_train_onehot = tf.keras.utils.to_categorical(y_train, num_classes)
y_test__onehot = tf.keras.utils.to_categorical(y_test, num_classes)
# Sample dataset
n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(n_rows):
for col in range(n_cols):
index = n_cols * row + col
plt.subplot(n_rows, n_cols, index + 1)
plt.imshow(X_train[index], cmap="binary", interpolation="nearest")
plt.axis('off')
plt.title(class_names[y_train[index]], fontsize=12)
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()
Implementation
Show the loss and accuracy for different iterations for
Choose the stepsize and momentum coefficients to obtain meaningful learning curves.
Also, compare the test accuracy of the above three optimizers with the sklearn.linear_model solver that you have used in Homework 1.
Comment on your observations.
## Solution 3.
import numpy as np
## Generate data
np.random.seed(1)
n = 3000
m = 20
# Enter your code here