Homework 2: Unconstrained optimization (gradient, subgradient, and acceleration methods)
This homework as two parts. In the first part, there are 5 questions related to theoretical aspects of unconstrained optimization. In the second part, we apply the theory developed in the course in this module to two popular problems, namely, robust regression and piecewise-linear optimization.
Submission instructions:
# Name:
# SR number:
# Program:
# Registration type (Credit/Audit):
This part has 10 questions.
[10 pts] Question 1. Consider a simple example of minimizing
$$ f(x) = x^2 $$
with gradient descent. This is L-smooth convex function.
Solution 1.
Enter your answer here
[10 pts] Question 2. Proof the following.
Suppose $f$ is $\mu$-strongly convex and $L$-smooth. Then gradient descent with $\eta_t = \eta = \frac{2}{\mu+L}$ satisfies
where $k = L/\mu$ is the condition number.
Solution 2.
Enter your answer here
[10 pts] Question 3.
Explain how to find the subgradients of the following.
Find the subdifferential of the following
$f({\bf x}) = \underset{i=1,\ldots,m}{\max} ({\bf a}_i^T{\bf x} + b_i)$
$f({\bf x}) = \|{\bf A}{\bf x} + {\bf b}\|_1$
Solution 3.
Enter your answer here
[10 pts] Question 4. Use subgradient method to find a point in the intersection of two closed convex sets $\mathcal{C}_1$ and $\mathcal{C}_2$ by solving
$$ \text{find} \quad {\bf x} \in \mathcal{C}_1 \cup \mathcal{C}_2 \quad \Leftrightarrow \quad {\rm minimize}_{\bf x} \,\, f({\bf x}) = {\max} (f_1({\bf x}),f_2({\bf x})) $$where $f_i({\bf x}) = {\rm inf}_{{\bf y} \in \mathcal{C}_i} \|{\bf x} - {\bf y}\|_2$ is the Euclidean distance of ${\bf x}$ to $\mathcal{C}_i.$
Show that the subgradient method with Polyak's step size rule boils down to the alternating projection method.
Solution 4.
Enter your answer here
[10 pts] Question 5. Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be convex and recall that its conjugate is $f^*({\bf y}) = {\rm sup}_{\bf x} \{{\bf y}^T{\bf x} - f({\bf x})\}$. Prove the following:
Also, show that for an arbitrary norm $f({\bf x}) = \|{\bf x}\|$, we have ${\bf g} \in \partial f({\bf 0})$ for any ${\bf g}$ that satisfies $\|{\bf g}\|_* \leq 1.$ Here $\|\cdot\|_*$ is the dual norm. Determine $\partial f({\bf 0})$ for $f({\bf x}) = \|{\bf x}\|_\infty.$
Solution 5.
Enter your answer here.
This part has three programming questions on understanding of gradient descent, Huber regression and piecewise-linear optimization.
[10 pts] Question 6. Gradient descent.
The aim of this question is to understand the geometric properties of gradient descent and the heavy-ball method for the simple two-dimensional quadratic form
$$f(\mathbf{x}) = 0.1 x_1^2 + 2 x_2^2.$$Use ${\bf x}_0 = [4,6]^T$ as an initial point.
Comment on your observations.
Optional: Check for different values of $\eta$ (including, exact line search) and $\beta$
import numpy as np
import matplotlib.pyplot as plt
# Enter your code here
[20 pts] Question 7. Outlier-robust Huber regression.
In a Huber regression problem, we are given data $({\bf x}_i,y_i)\in \mathbb{R}^n \times \mathbb{R}$, $i=1,\ldots, m$. we solve
$$ \underset{{\boldsymbol \beta}}{\text{minimize}} \sum\limits_{i=1}^m \phi(y_i - {\bf x}_i^T{\boldsymbol \beta}) $$for variable ${\boldsymbol \beta},$ where the Huber function
$$ \phi(u) = \left\{ \begin{array}{ll} u^2 & |u| \leq M\\ 2M|u| - M^2 & |u| > M \end{array}\right. $$where $M>0$ is the Huber threshold.
Implementation
We generate data with $m=500$ measurements and $n=300$ regressors. We randomly choose ${\boldsymbol \beta}$ and ${\bf x}_i \sim \mathcal N(0,I)$. We set $y_i = {\bf x}_i^T{\boldsymbol \beta} + \epsilon_i$, where $\epsilon_i \sim \mathcal N(0,1)$. To introduce outliers, with probability $p$ we replace $y_i$ with $-y_i$. We generate a problem instance with $p= 0.01$.
Implement gradient descent, to recover ${\boldsymbol \beta} \in {\bf R}^n$ from the measurements $y\in {\bf R}^m$ by solving Huber regression.
Plot the suboptimality gap $$ f({\boldsymbol \beta}_k) - f^\star, $$ where you can use the solution from homework 1, i.e., sklearn.linear_model to compute $f^\star$.
Use a fixed step size of $1/L$.
Suppose now you have an incorrect estimate $L$ and use that to compute the step size. Based on the numerical results, comment on your observations how critical is to use the correct value for $L$.
## Solution 6.
import numpy as np
from sklearn.linear_model import HuberRegressor
## Generate data
np.random.seed(1)
n = 300
m = 450
beta_true = 5*np.random.normal(size=(n,1))
X = np.random.randn(n, m)
Y = np.zeros((m,1))
# Enter your code here
[20 pts] Question 8.
Consider the problem of minimizing a piecewise linear function
$$ {\rm minimize}_{\bf x} \,\, f({\bf x}) = \underset{i=1,\ldots,m}{\max} ({\bf a}_i^T{\bf x} + b_i) $$with variable ${\bf x} \in \mathbb{R}^d$. We implement the subgradient method with $d=20$ variables and $m=100$ terms. We generate data ${\bf a}_i$ and $b_i$ from a unit normal distribution and initialize with ${\bf x}_0 = {\bf 0}$.
Implement a subgradient method to solve the above problem. Express the above problem as a linear problem and use CVX solver to find $f^{\rm opt}$.
Through simulations, plot the convergence of $f^{\rm best}_k - f^{\rm opt}$ for
## Generate data
np.random.seed(1)
A = np.random.randn(m, d)
b = np.random.randn(m, 1)
# Enter your code here