Fall 2019 (Aug-Dec)

Online Prediction and Learning

Course No.: E1 245

Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, E-mail: first-name AT iisc.ac.in

Time: TTh 1700-1830 (Note: time changed; First class: Thu 8 Aug 2019)

Place: ECE 1.07

Course Description: The ability to make continual and accurate forecasts and decisions under uncertainty is key in many of today’s data-driven intelligent systems (think Internet recommendation engines, automated trading, resource allocation, etc.). This elective course exposes students to techniques in sequential learning and optimization making under uncertainty. We will explore several frameworks and algorithms for online and reinforcement learning, aiming to develop a rigorous understanding. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), sequential data compression (information theory), etc.

Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Stochastic games, Blackwell approachability; Multi-armed bandits, Reinforcement learning

Prerequisites: A basic course in probability or stochastic processes. Exposure to convexity (convex geometry, convex analysis or convex optimization) will be helpful but is not absolutely necessary. Contact the instructor for clarifications.

Grading policy: Homework assignments — 30%, exam — 40%, mini-project — 30%

Text/References: There is no single reference text for the course. However, a rough basis will be "Prediction, Learning and Games" (PLG). Nicolo Cesa-Bianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].

Other useful resources that will be followed as appropriate:

Mini-project:

Description. The mini-project carries 30% of course credit, and should achieve (at least) one of the following goals: (a) a literature survey of a sub-topic/area of online learning featuring recent results/state-of-the-art developments (about 2-3 papers' worth of material, see list of topics below), (b) implementation and benchmarking of known algorithms applied to interesting and new problem domains (run on real-world data or environments if possible), (c) original research on an idea that you may propose. The student will have to make a presentation to the class about his/her project at the end of the term (schedule TBD), of about 30-40 min. Please discuss your proposed mini-project with the instructor in advance, and submit a short proposal (1 page max.) for what you would like to do — this should include a plan for execution.

Timeline: Deadline for submitting project proposal: Nov 5. Project presentations: December 5-6.

List of suggested topics, along with representative references, for the mini project (will be continuously updated); feel free to pick among these, or to pursue other topics that excite or intrigue you. Generally, good venues to look for high-quality work are the proceedings of the ICML, NeurIPS (formerly NIPS), COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years, and the JMLR and MLJ journals.

Lectures:

  • Lecture 1 (Aug 8). Introduction to online learning, 1 bit prediction with expert advice, the Halving algorithm

  • Lecture 2 (Aug 13). 1 bit prediction without the realizability assumption — the Weighted Majority algorithm

  • Lecture 3 (Aug 17). Prediction with expert advice and convex losses, the Exponential Weights algorithm

  • Lecture 4 (Aug 20). Randomized prediction with experts, the Randomized Exponential Weights algorithm

  • Lecture 5 (Aug 22). Minimax regret, and the impossibility of beating Exponential Weights' regret with convex losses

  • Lecture 6 (Aug 27). Application: Sequential portfolio allocation in arbitrary markets

  • Lecture 7 (Aug 29). Cover’s Universal Portfolio algorithm, analysis of its worst-case wealth ratio over constantly rebalancing portfolios

  • Lecture 8 (Sep 3). Online convex optimization — Follow The Leader (FTL) algorithm

  • Lecture 9 (Sep 5). Online convex optimization — Follow The Regularized Leader (FTRL) algorithm

  • Lecture 10 (Sep 12). Online convex optimization — FTRL regret with strongly convex regularizers and Lipschitz losses

  • Lecture 11 (Sep 17). Online convex optimization — FTRL as Online Mirror Descent

  • Lecture 12 (Sep 19). Online convex optimization — FTRL as Online Mirror Descent — Fenchel duals and Bregman divergences

  • Lecture 13 (Oct 3). Online convex optimization — Lazy and Active Online Mirror Descent and regret bounds

  • Lecture 14 (Oct 5). The nonstochastic multi armed bandit and the EXP3 algorithm

  • Lecture 15 (Oct 10). Stochastic bandits — the epsilon-greedy algorithm

  • Lecture 16 (Oct 15). Stochastic bandits — the Upper Confidence Bound (UCB) algorithm

  • Lecture 17 (Oct 17). Stochastic bandits — the Upper Confidence Bound (UCB) algorithm

  • Lecture 18 (Oct 19). Stochastic bandits — the Thompson sampling algorithm

  • Lecture 19 (Oct 22). Stochastic bandits — the Thompson sampling algorithm

  • Lecture 20 (Oct 24). The best arm identification (pure exploration) problem in bandits, Median elimination

  • Lecture 21 (Oct 29). Lower bounds on effort for bandit algorithms using information theory

  • Lecture 22 (Oct 31). Lower bounds on effort for bandit algorithms using information theory

  • Lecture 23 (Nov 5). Lower bounds on effort for bandit algorithms using information theory

  • Lecture 24 (Nov 7). Linear stochastic bandits

  • Lecture 25 (Nov 14). Learning Markov Decision Processes - 1

  • Lecture 26 (Nov 16). Learning Markov Decision Processes - 2

  • Lecture 27 (Nov 19). Learning Markov Decision Processes - 3

  • Lecture 28 (Nov 21). Learning Markov Decision Processes - 4

Homework:

Final Exam: 26 Nov 2019, 5pm-8pm, class venue

[Final exam]


Last updated: 07-Jan-2022, 18:33:27 IST