E1 260 Optimization for Machine Learning and Data Science (OptML)

About E1 260 OptML

The main goal of E1 260 course is cover optimization techniques suitable for problems that frequently appear in the areas of data science, machine learning, communications, and signal processing. This course focusses on the computational, algorithmic, and implementation aspects of such optimization techniques.

This is 3:1 credit course.

Prerequisite

Basic linear algebra, probability, and knowledge of Python to conduct simulation exercises.

Syllabus

Mathematical background, theory of convex functions, gradient methods, accelerated gradient methods, proximal gradient descent, mirror descent, sub gradient methods, stochastic gradient descent and variants, Project gradient descent and Frank-Wolfe, alternating direction method of multipliers, nonconvex and submodular optimization.

Textbooks

A. Beck, First-Order Methods in Optimization, MOS-SIAM Series on Optimization, 2017.
S. Bubeck, Convex Optimization: Algorithms and Complexity, Foundations and Trends in Optimization, 2015.
F. Bach, “Learning with Submodular Functions: A Convex Optimization Perspective”, Foundations and Trends in Machine Learning, Now Publishers Inc.

Other useful resources

S. Boyd, N. Parikh, and E. Chu,“ Distributed optimization and statistical learning via the alternating direction method of multipliers”, Foundations and Trends in Machine Learning, Now Publishers Inc.

Syllabus

Lecture number	Topic	Reading	Materials	Exercises
1	Introduction		slides
2	Mathematical background	Rate of convergence (Chapter 4.2)	lecture notes
3	Mathematical background		lecture notes
4	Theory of convex function		lecture notes
5	Theory of convex function		lecture notes
6	Theory of convex function		lecture notes
7	Theory of convex function		lecture notes
8	Gradient descent		lecture notes
9	Gradient descent		lecture notes
10	Acceleration methods		lecture notes
11	Acceleration methods		lecture notes
12	Projected gradient descent		lecture notes
13	Subgradient method		lecture notes
14	Proximal method		lecture notes
15	Frank Wolfe		lecture notes
16	Mirror descent		lecture notes
17	Mirror descent		lecture notes
18	Stochastic gradient descent		lecture notes
19	Stochastic gradient descent		lecture notes
20	Stochastic variance reduced gradient		lecture notes
21	Duality and KKT conditions		lecture notes
22	ADMM		lecture notes
23	ADMM		lecture notes
24	Submodular optimization		lecture notes