Fall 2019 (AugDec)
Online Prediction and Learning
Course No.: E1 245
Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, Email: firstname AT iisc.ac.in
Time: TTh 17001830 (Note: time changed; First class: Thu 8 Aug 2019)
Place: ECE 1.07
Course Description: The ability to make continual and accurate forecasts and decisions under uncertainty is key in many of today’s datadriven intelligent systems (think Internet recommendation engines, automated trading, resource allocation, etc.). This elective course exposes students to techniques in sequential learning and optimization making under uncertainty. We will explore several frameworks and algorithms for online and reinforcement learning, aiming to develop a rigorous understanding. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), sequential data compression (information theory), etc.
Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Stochastic games, Blackwell approachability; Multiarmed bandits, Reinforcement learning
Prerequisites: A basic course in probability or stochastic processes. Exposure to convexity (convex geometry, convex analysis or convex optimization) will be helpful but is not absolutely necessary. Contact the instructor for clarifications.
Grading policy: Homework assignments — 30%, exam — 40%, miniproject — 30%
Text/References: There is no single reference text for the course. However, a rough basis will be "Prediction, Learning and Games" (PLG). Nicolo CesaBianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].
Other useful resources that will be followed as appropriate:
Miniproject:
Description. The miniproject carries 30% of course credit, and should achieve (at least) one of the following goals: (a) a literature survey of a subtopic/area of online learning featuring recent results/stateoftheart developments (about 23 papers' worth of material, see list of topics below), (b) implementation and benchmarking of known algorithms applied to interesting and new problem domains (run on realworld data or environments if possible), (c) original research on an idea that you may propose. The student will have to make a presentation to the class about his/her project at the end of the term (schedule TBD), of about 3040 min. Please discuss your proposed miniproject with the instructor in advance, and submit a short proposal (1 page max.) for what you would like to do — this should include a plan for execution.
Timeline: Deadline for submitting project proposal: Nov 5. Project presentations: December 56.
List of suggested topics, along with representative references, for the mini project (will be continuously updated); feel free to pick among these, or to pursue other topics that excite or intrigue you. Generally, good venues to look for highquality work are the proceedings of the ICML, NeurIPS (formerly NIPS), COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years, and the JMLR and MLJ journals.
Lectures:

Lecture 1 (Aug 8). Introduction to online learning, 1 bit prediction with expert advice, the Halving algorithm

Lecture 2 (Aug 13). 1 bit prediction without the realizability assumption — the Weighted Majority algorithm

Lecture 3 (Aug 17). Prediction with expert advice and convex losses, the Exponential Weights algorithm

Lecture 4 (Aug 20). Randomized prediction with experts, the Randomized Exponential Weights algorithm

Lecture 5 (Aug 22). Minimax regret, and the impossibility of beating Exponential Weights' regret with convex losses

Lecture 6 (Aug 27). Application: Sequential portfolio allocation in arbitrary markets

Lecture 7 (Aug 29). Cover’s Universal Portfolio algorithm, analysis of its worstcase wealth ratio over constantly rebalancing portfolios

Lecture 8 (Sep 3). Online convex optimization — Follow The Leader (FTL) algorithm

Lecture 9 (Sep 5). Online convex optimization — Follow The Regularized Leader (FTRL) algorithm

Lecture 10 (Sep 12). Online convex optimization — FTRL regret with strongly convex regularizers and Lipschitz losses

Lecture 11 (Sep 17). Online convex optimization — FTRL as Online Mirror Descent

Lecture 12 (Sep 19). Online convex optimization — FTRL as Online Mirror Descent — Fenchel duals and Bregman divergences

Lecture 13 (Oct 3). Online convex optimization — Lazy and Active Online Mirror Descent and regret bounds

Lecture 14 (Oct 5). The nonstochastic multi armed bandit and the EXP3 algorithm

Lecture 15 (Oct 10). Stochastic bandits — the epsilongreedy algorithm

Lecture 16 (Oct 15). Stochastic bandits — the Upper Confidence Bound (UCB) algorithm

Lecture 17 (Oct 17). Stochastic bandits — the Upper Confidence Bound (UCB) algorithm

Lecture 18 (Oct 19). Stochastic bandits — the Thompson sampling algorithm

Lecture 19 (Oct 22). Stochastic bandits — the Thompson sampling algorithm

Lecture 20 (Oct 24). The best arm identification (pure exploration) problem in bandits, Median elimination

Lecture 21 (Oct 29). Lower bounds on effort for bandit algorithms using information theory

Lecture 22 (Oct 31). Lower bounds on effort for bandit algorithms using information theory

Lecture 23 (Nov 5). Lower bounds on effort for bandit algorithms using information theory

Lecture 24 (Nov 7). Linear stochastic bandits

Lecture 25 (Nov 14). Learning Markov Decision Processes  1

Lecture 26 (Nov 16). Learning Markov Decision Processes  2

Lecture 27 (Nov 19). Learning Markov Decision Processes  3

Lecture 28 (Nov 21). Learning Markov Decision Processes  4
Homework:

Homework 1 posted Aug 20, due Sep 3

Homework 2 posted Sep 3, due Sep 17

Homework 3 posted Oct 2, due Oct 15

Homework 4 posted Oct 18, due Oct 31

Homework 5 posted Nov 8, due Nov 21
Final Exam: 26 Nov 2019, 5pm8pm, class venue
Last updated: 23Feb2023, 15:23:50 IST