Homework 1: Theory of convex functions
This homework as two parts. In the first part, there are 10 questions related to the theory of convex functions. In the second part, we will use scikit-learn API, which an off-the-shelf tool for machine learning and data science to understand and solve three regression problems namely, logistic regression, softmax regression, and robust regression.
Submission instructions:
# Name:
# SR number:
# Program:
# Registration type (Credit/Audit):
This part has 10 questions.
[10 pts] Question 1. Show that the intersection of the solution set of a quadratic inequality
$$ \mathcal{C} = \{{\bf x} \in \mathbb{R}^n: {\bf x}^T{\bf A}{\bf x} + {\bf b}^T{\bf x} + c \leq 0\} $$
and a hyperplane defined by
$${\bf g}^{T}{\bf x} + h = 0$$ is convex if
Here, ${\bf A} \in \mathbb{S}^n$, ${\bf b} \in \mathbb{R}^n$, $c \in \mathbb{R}$, ${\bf g} \neq {\bf 0}$, and $\lambda \in \mathbb{R}$.
Solution 1.
Enter your answer here
[10 pts] Question 2. Suppose $h: \mathbb{R}^k \rightarrow \mathbb{R}$ and $g: \mathbb{R}^n \rightarrow \mathbb{R}^k$. Then show that their composition $f = h \circ g$ is convex when
Show that the function $f({\bf x}) = e^{\beta{\bf x}^T{\bf Q}{\bf x}}$ is convex for $\beta \in \mathbb{R}_+$ and ${\bf Q} \succeq {\bf 0}.$
Solution 2.
Enter your answer here
[10 pts] Question 3. Show that the sum of $k$ largest eigenvalues of ${\bf X} \in \mathbb{S}^n$ is convex on $\mathbb{S}^n$.
Solution 3.
Enter your answer here
[10 pts] Question 4. Show that the solution set of a convex optimization problem
$$ \begin{aligned} \underset{\bf x}{\text{minimize}} \quad \quad &f_0({\bf x})\nonumber\\ \text{subject to} \quad &f_i({\bf x}) \leq 0, \, i=1,2,\cdots,m\nonumber\\ &{\bf a}_i^T{\bf x} = {b}_i, \, i=1,2,\cdots,p \end{aligned} $$is convex.
Solution 4.
Enter your answer here
[10 pts] Question 5. Let $\mathcal{X} \subset \mathbb{R}^n$ be an open convex set. A convex differentiable function $f:\mathcal{X} \rightarrow \mathbb{R}$ is $L$-smooth if and only if $$ g({\bf x}) = f({\bf x}) - \frac{L}{2} \|{\bf x}\|^2 $$ is concave.
Prove the above proposition.
Solution 5.
Enter your answer here
[10 pts] Question 6. Show that there exists at most one global minimum of a strictly convex function $f$.
Solution 6.
Enter your answer here
[10 pts] Question 7. Prove the following proposition.
Strong convexity $\Rightarrow$ Strictly Convexity $\Rightarrow$ Convexity.
$\Rightarrow$ is read as "implies".
Solution 7.
Enter your answer here
[10 pts] Question 8.
A. Give one example of a function that is
B. Is $x^4$ $L$-Lipschitz smooth?
Solution 8.
Enter your answer here
[5 pts] Question 9.: Suppose $f(\cdot)$ is $\alpha$-strongly convex and $g(\cdot)$ is convex. Show that $h = f+g$ is strongly convex.
Solution 9.
Enter your answer here
[15 pts] Question 10. If $f:\mathbb{R}^n \rightarrow \mathbb{R}$ is a closed and $\alpha$ strongly function, then its conjugate function $f^*$ is $\frac{1}{\alpha}$ smooth.
Hint: Assume $f^*$ is differentiable everywhere, with gradient $$ \nabla f^*({\bf y}) = \underset{{\bf x}}{\arg\max}\, ({\bf y}^{T}{\bf x} - f({\bf x})). $$
Find the conjugate function of
$$ f({\bf x})=\begin{cases} - \sqrt{1 - \|{\bf x}\|_2^2}, & \|{\bf x}\|_2 \leq 1\\ \infty & \text{else.} \end{cases} $$Also, compute the strong convexity parameter of $f$ and smoothness parameter of $f^*$ to verify the above claim.
Solution 10.
Enter your answer here
This part has two programming questions on Huber regression and logistic regression
Question 11. Outlier-robust Huber regression
In a linear regression problem, we are given data $({\bf x}_i,y_i)\in \mathbb{R}^n \times \mathbb{R}$, $i=1,\ldots, m$. We fit an affine model
$$\hat{y}_i = {\bf x}_i^T{\boldsymbol \beta}$$for $i=1,\ldots, m$ with ${\boldsymbol \beta} \in \mathbb{R}^n$.
In standard linear regression, we choose ${\boldsymbol \beta}$ by solving $$ \underset{{\boldsymbol \beta}}{\text{minimize}} \quad \sum\limits_{i=1}^m r_i^2 = \sum\limits_{i=1}^m (y_i - {\bf x}_i^T{\boldsymbol \beta})^2. $$
Huber regression is a technique that is robust to outliers. In Huber regression, we replace the square function with the Huber function
$$ \phi(u) = \left\{ \begin{array}{ll} u^2 & |u| \leq M\\ 2M|u| - M^2 & |u| > M \end{array}\right. $$where $M>0$ is the Huber threshold. That is, we solve
$$ \underset{{\boldsymbol \beta}}{\text{minimize}} \sum\limits_{i=1}^m \phi(y_i - {\bf x}_i^T{\boldsymbol \beta}) $$for variable ${\boldsymbol \beta}.$
[10 pts] Question 11a. Is $\phi(\cdot)$ $L$-smooth? If yes, what is the smoothness parameter? Show that $\phi(\cdot)$ is a convex function.
Solution 11a. Enter you answer here
[5 pts] Question 11b. Plot the Huber function $\phi(u)$ for different values of $M \in \{1,5, 1000\}$ with $u\in [-10,10]$. Compare with the plot of the quadratic function $u^2$.
# Write your code here
import numpy as np
from matplotlib import pyplot as plt
# Define the Huber loss
#def Phi(u, M):
[25 pts] Question 11c.
We generate data with $m=500$ measurements and $n=300$ regressors. We randomly choose ${\boldsymbol \beta}$ and ${\bf x}_i \sim \mathcal N(0,I)$. We set $y_i = {\bf x}_i^T{\boldsymbol \beta} + \epsilon_i$, where $\epsilon_i \sim \mathcal N(0,1)$. To introduce outliers, with probability $p$ we replace $y_i$ with $-y_i$. We generate $50$ problem instances, with $p$ varying from $0$ to $0.15$,
Recover ${\boldsymbol \beta} \in {\bf R}^n$ from the measurements $y\in {\bf R}^m$ and compare the three approaches: linear regression, Huber regression and clairvoyant regression. In clairvoyant regression we know which measurements had their sign flipped.
Plot the relative error $$ \frac{\|\hat{\boldsymbol \beta} - {\boldsymbol \beta}\|_2}{\|{\boldsymbol \beta}\|2}, $$ where $\hat{\boldsymbol \beta}$ is the recovered ${\boldsymbol \beta}$.
Comment on your observations
Use sklearn.linear_model for Huber regression and linear regression
## Solution 11c.
import numpy as np
from sklearn.linear_model import HuberRegressor, LinearRegression
## Generate data
np.random.seed(1)
n = 300
m = 450
TESTS = 50
beta_true = 5*np.random.normal(size=(n,1))
X = np.random.randn(n, m)
Y = np.zeros((m,1))
p_vals = np.linspace(0,0.15, num=TESTS)
# MODIFY and ENTER YOUR CODE HERE
for tst, p in enumerate(p_vals):
# Generate the sign changes.
factor = 2*np.random.binomial(1, 1-p, size=(m,1)) - 1
Y = factor*X.T.dot(beta_true)
# Solve standard regression
# Solve clairvoyant regression
# Solve Huber regression with M=1
# Plot the relative error for the three methods
# for p_vals
Question 12. Logistic regression
We work with a famous dataset that contains the sepal and petal length and width of 150 iris flowers of three different species: Iris setosa, Iris versicolor, and Iris virginica.
import numpy as np
from sklearn import datasets
iris = datasets.load_iris()
print(iris.DESCR)
Our aim is to build a classifier to detect the Iris virginica type based on the petal width and length features.
Given data $({\bf x}_i,y_i)$, $i=1,\ldots, m$. The ${\bf x}_i \in {\bf R}^n$ are feature vectors and $y_i \in \{0, 1\}$ indicates the associated boolean classes.
Our goal is to construct a linear classifier determined by the halfspace ${\bf x}^T{\boldsymbol \beta}>0$. Specifically, we predict that the $i$th instance belongs to class 1 if ${\bf x}^T{\boldsymbol \beta} >0$ and class $0$ otherwise. We find ${\boldsymbol \beta}$ by solving the maximizing likelihood problem
$$ \underset{\boldsymbol \beta}{\text{maximize}} \quad f({\boldsymbol \beta}) = \sum_{i=1}^{m} y_i {\bf x}_i^T{\boldsymbol \beta} - \log(1 + \exp ({\bf x}_i^T{\boldsymbol \beta})). $$[10 pts] Question 12a. Show that $-f$ is a convex function of ${\boldsymbol \beta}$.
Suppose we would like to regularize the logistic regression model with an $\ell_2$ penalty to maximize the objective $g({\boldsymbol \beta})=f({\boldsymbol \beta}) - \lambda \|{\boldsymbol \beta}\|_2^2$. Is the regularized problem a convex optimization problem?
Solution 12a.
Enter your solution here
[10 pts] Question 12b. Use sklearn.linear_model to train a logistic regression model.
Show the linear boundary displaying two features petal width and length along with the instances (i.e., Iris virginica in one halfspace and not Iris virginica in the other).
from sklearn.linear_model import LogisticRegression
X = iris["data"][:, (2, 3)] # petal length, petal width
y = (iris["target"] == 2).astype(np.int) # 1 if Iris virginica, else 0
# Enter your code here
[20 pts] Question 12c. Extension of logistic regression to handle multiple classes is referred to as multinomial logistic regression or softmax regression.
Given data $({\bf x}_i, {\bf y}_i)$, $i=1,2,\ldots,m$, where the one-hot vector ${\bf y}_i \in \{0,1\}^K$ indicates the class of the $i$th instance. Formulate the softmax regression problem to support $K$ classes with the parameter matrix ${\boldsymbol \Theta} = [{\boldsymbol \beta}_1,{\boldsymbol \beta}_2,\cdots,{\boldsymbol \beta}_K] \in \mathbb{R}^{n \times K}$.
Is this a convex optimization?
[20 pts] Question 12c. Use sklearn.linear_model to train a softmax regression model.
Shows the resulting decision boundaries with background colors. The decision boundaries between any two classes are linear.
from sklearn.linear_model import LogisticRegression
X = iris["data"][:, (2, 3)] # petal length, petal width
y = iris["target"] # Unlike the previous question, there are three class
# Enter your code here