E1 260: Optimization for Machine Learning and Data Science¶

Homework 3: Constrained optimization (Projected, Proximal, FW, and mirror descent methods)

This homework as two parts. In the first part, there are 5 questions related to theoretical aspects of constrained optimization. In the second part, we apply the theory developed in the course in this module to some popular problems, namely, robust regression and LASSO.

Submission instructions:

Make a copy of the Colab .ipynb file and share the link of your Colab file with answers as instructed.
Use text cells to answer Part A of the homework. Type equations in Latex in a text cell.
For the programming part of Part B of the homework, use code cell and text cell. The submitted codes should compile without any errors and do not erase the outputs.

# Name: 
# SR number:
# Program:
# Registration type (Credit/Audit):

Part A¶

This part has 10 questions.

[10 pts] Question 1. Compute the projection operator $P_C({\bf x})$ for the following constraint sets.

Unit simplex: $\mathcal{C}=\{{\bf x} : {\bf 1}^T{\bf x} = 1, {\bf x} \succeq {\bf 0}\}$.
Intersection of hyperplane and rectangle: $\mathcal{C}=\{{\bf x} : {\bf a}^T{\bf x} = b, l \leq {\bf x} \leq u\}$.

Solution 1.

Enter your answer here

[10 pts] Question 2. Prove the following.

If $f({\bf x}) = g(\|{\bf x}\|_2)$ with ${\rm dom}(g) = [0,\infty)$, then $$ {\rm prox}_f({\bf x}) = {\rm prox}_g(\|{\bf x}\|_2)\frac{{\bf x}}{\|{\bf x}\|_2} \quad \forall{\bf x} \neq {\bf 0}. $$
Suppose $f$ is a closed convex function, and $f^*({\bf x}) = {\rm sup}_{{\bf z}} \{ \langle{\bf x},{\bf z}\rangle - f({\bf z})\}$ is the convex conjgate of $f$. Then we have the Moreau decomposition $$ {\bf x} = {\rm prox}_f({\bf x}) + {\rm prox}_{f^*}({\bf x}). $$

Solution 2.

Enter your answer here

[10 pts] Question 3. Compute the mirror descent updates for minimizing a differentiable function on a spectrahedron $$ \mathcal{S}_n = \{{\bf X} \in \mathbb{S}_+^n : {\rm trace}({\bf X}) = 1\} $$ with $$ \varphi({\bf X}) = \sum_{i=1}^n \lambda_i({\bf X}) \log \lambda_i({\bf X}). $$

Here, $\lambda_1({\bf X}), \ldots, \lambda_n({\bf X})$ are the eigenvalues of ${\bf X}$.

Also, compute the Bregman divergence and the update equation with $$ \varphi({\bf X}) = \sum_{i=1}^n \lambda_i({\bf X}) \log \lambda_i({\bf X}) - \lambda_i({\bf X}). $$ (This results in the so-called von Neumann divergence.)

Solution 3.

Enter your answer here

[10 pts] Question 4. Compute the Frank-Wolfe updates for the following optimization problem

$$ \text{minimize} \quad f({\bf X}) \quad \text{subject to} \quad \|{\bf X}\|_{\rm tr} \leq t $$

where $\|\cdot\|_{\rm tr}$ is the trace norm.

Solution 4.

Enter your answer here

[10 pts] Question 5. Show that the duality gap of the Frank-Wolfe iterations is given by $$ \langle \nabla f({\bf x}_k), {\bf x}_k - {\bf s}_k\rangle, $$ where ${\bf s}_k$ is the solution to the linear minimization problem in the Frank-Wolfe algorithm.

Also, show that $$ f({\bf x}_k) - f^\star \leq \langle \nabla f({\bf x}_k), {\bf x}_k - {\bf s}_k\rangle. $$

Solution 5.

Enter your answer here.

Part B¶

This part has three programming questions on understanding of gradient descent, Huber regression and piecewise-linear optimization.

[10 pts] Question 6. The Frank-Wolfe method.

The aim of this question is to understand the Frank-Wolfe iterations for minimizing the simple two-dimensional quadratic form

$$f(\mathbf{x}) = \frac{1}{2}({\bf x} - {\bf x}^\star)^T({\bf x} - {\bf x}^\star)$$

over a polygon (square) with end points $(-0.5,-0.5),(0.5,-0.5),(0.5,0.5), (-0.5,0.5)$.

Suppose we initialize at ${\bf x}_0 = [-0.5,0.5]^T$. Display the first 20 Frank-Wolfe iterations on a contour plot (see e.g., the figure below) for the two cases with ${\bf x}^\star = [1,0.3]^T$ and ${\bf x}^\star = [0,0]^T$.

Comment on your observations.

## Solution 6.
import numpy as np
import matplotlib.pyplot as plt

# Enter your code here

[20 pts] Question 7. Robust regression.

Consider the following robust regression problem

$$ \underset{{\bf x}}{\text{minimize}} \quad f({\bf x}) = \sum\limits_{i=1}^m |{\bf a}_i^T{\bf x} - {b}_i| $$

subject to

$$ {\bf x} \in \mathcal{C} = \{{\bf x} \in \mathbb{R}^n : {\bf 1}^T{\bf x} = 1, {\bf x}\succeq {\bf 0}\}. $$

Implementation

We generate data with $m=20$ measurements and $n=3000$. We randomly choose ${\bf a}_i \sim {\mathcal N}(0,{\bf I}_{n\times n})$ and $$ b_i = \frac{(a_{i,1} + a_{i,2})}{2} + e_i $$ where $e_i = {\mathcal N}(0,0.01)$

Implement projected subgradient descent with Polyak stepsize rule.
Implement mirror descent update (also known as entropic descent) with $\varphi({\bf x}) = \sum_{i=1}^n x_i \log x_i.$

Through simulations, plot the convergence of $f^{\rm best}_k - f^{\rm opt}$. Use CVX solver to find $f^{\rm opt}$.

## Solution 7.
import numpy as np


## Generate data 
np.random.seed(1)
n = 3000
m = 20


# Enter your code here

[20 pts] Question 8.

Consider the following LASSO problem $$ \underset{\bf x}{\rm minimize} \,\, \frac{1}{2}\| {\bf A}{\bf x} - {\bf b}\|_2^2 + \rho \| {\bf x}\|_1 $$ with variable ${\bf x} \in \mathbb{R}^d$.

The above problem can be expressed as an equivalent constrained optimization problem $$ \underset{\bf x}{\rm minimize} \,\, f({\bf x}) = \frac{1}{2}\| {\bf A}{\bf x} - {\bf b}\|_2^2 $$ subject to $$ \| {\bf x}\|_1 \leq t. $$

We generate data with $d = 1000$, i.i.d. Gaussian ${\bf A} \in \mathbb{R}^{2000 \times 1000}$ as $a_{ij} \sim \mathcal{N}(0,1)$ and ${\bf x}$ from a unit normal distribution.

Vary the tuning parameters $t$ and $\rho$ to obtain the same minimizer for both the problems.
Compare the suboptimality gap, $f({\bf x}_k) - f^\star$, of the proximal gradient method to that of the Frank-Wolfe method. Implement the proximal gradient method with $\eta_k = 1/L$ with $L = \lambda_{\max}({\bf A}^T{\bf A}).$

Use CVX solver to find $f^{\rm opt}$.

## Solution 8.
import numpy as np


## Generate data 
np.random.seed(1)


# Enter your code here