E1 260 Optimization for Machine Learning and Data Science (OptML)

About E1 260 OptML

The main goal of E1 260 course is cover optimization techniques suitable for problems that frequently appear in the areas of data science, machine learning, communications, and signal processing. This course focusses on the computational, algorithmic, and implementation aspects of such optimization techniques.

This is 3:1 credit course.


Basic linear algebra, probability, and knowledge of Python to conduct simulation exercises.



Mathematical background, theory of convex functions, gradient methods, accelerated gradient methods, proximal gradient descent, mirror descent, sub gradient methods, stochastic gradient descent and variants, Project gradient descent and Frank-Wolfe, alternating direction method of multipliers, nonconvex and submodular optimization.


  • A. Beck, First-Order Methods in Optimization, MOS-SIAM Series on Optimization, 2017.

  • S. Bubeck, Convex Optimization: Algorithms and Complexity, Foundations and Trends in Optimization, 2015.

  • F. Bach, “Learning with Submodular Functions: A Convex Optimization Perspective”, Foundations and Trends in Machine Learning, Now Publishers Inc.

  • Other useful resources

  • S. Boyd, N. Parikh, and E. Chu,“ Distributed optimization and statistical learning via the alternating direction method of multipliers”, Foundations and Trends in Machine Learning, Now Publishers Inc.

Course requirements and grading

  • Four assignments (problems and programming): 10% each, i.e., 40% in total

  • Two projects: 10% each, i.e., 20% in total

  • Final assessment: 40%

Schedule (pdf)

Lecture number Topic Reading Materials Exercises
1 Introduction slides
2 Mathematical background Rate of convergence (Chapter 4.2) lecture notes
3 Mathematical background lecture notes
4 Theory of convex function lecture notes
5 Theory of convex function lecture notes
6 Theory of convex function lecture notes
7 Theory of convex function lecture notes
8 Gradient descent lecture notes
9 Gradient descent lecture notes
10 Acceleration methods lecture notes
11 Acceleration methods lecture notes
12 Projected gradient descent lecture notes
13 Subgradient method lecture notes
14 Proximal method lecture notes
15 Frank Wolfe lecture notes
16 Mirror descent lecture notes
17 Mirror descent lecture notes
18 Stochastic gradient descent lecture notes
19 Stochastic gradient descent lecture notes
20 Stochastic variance reduced gradient lecture notes
21 Duality and KKT conditions lecture notes
22 ADMM lecture notes
23 ADMM lecture notes
24 Submodular optimization lecture notes