Aditya Gopalan’s Homepage: Teaching

Fall 2018 (Aug-Dec)

Online Prediction and Learning

Course No.: E1 245

Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, E-mail: first-name AT iisc.ac.in

Time: TTh 1400-1530

Place: ECE 1.07

Course Description: The ability to make continual and accurate forecasts and decisions under uncertainty is key in many of today’s data-driven intelligent systems (e.g. Internet recommendation engines, automated trading, etc.). This elective course is aimed to expose students to techniques for online or sequential learning/decision making under uncertainty. We will explore several frameworks and algorithms for online prediction, along with rigorous analyses of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), sequential data compression (information theory), etc.

Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Bandits; Applications- sequential investment/portfolio selection, universal lossless data compression, Stochastic games- Blackwell approachability, Learning systems with state- online reinforcement learning

Prerequisites: A basic course in probability or stochastic processes. Exposure to convexity (convex geometry, convex analysis or convex optimization) will be helpful but is not absolutely necessary. Contact the instructor for clarifications.

Grading policy: Homework assignments — 30%, exam — 30%, mini-project — 30%, lecture scribing — 10%

Text/References: There is no single reference text for the course. However, a rough basis will be "Prediction, Learning and Games" (PLG). Nicolo Cesa-Bianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].

Other useful resources that will be followed as appropriate:

Mini-project:

Description. The mini-project carries 30% of course credit, and can comprise either one of the following: (a) a literature survey of a sub-topic/area of online learning featuring recent results/state-of-the-art developments (about 2-3 papers' worth of material, see list of topics below), (b) implementation and benchmarking of known algorithms applied to interesting and new problem domains (run on real-world data or environments if possible), (c) original research on an idea that you may propose. The student will have to make a presentation to the class about his/her project at the end of the term (schedule TBD), of about 30-40 min. Please discuss your proposed mini-project with the instructor in advance, and submit a short proposal (1 page max.) for what you would like to do — this must include a structured plan for execution.

Timeline. Deadline for submitting project proposal: Oct 1. Project presentations: Between Nov 22-29 (exact dates TBD).

List of potential research papers for project ideas (will be continuously updated)â€‰â€”â€‰feel free to pick among these, or to pursue other topics that excite or intrigue you. Generally, good venues to look for high-quality work are the proceedings of the ICML, NIPS, COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years, and the JMLR and MLJ journals.

Scribing:

Students are expected to scribe lectures in pairs, using the LaTeX typesetting system. The goal is to post high-quality scribed notes within a week of the lecture. Please send me a draft of the scribe notes (with LaTeX source) within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

Lectures:

Lecture 1 (Aug 2). Introduction to online learning, binary sequence prediction, the Halving algorithm, mistake bounds
Lecture 2 (Aug 7). The Weighted Majority algorithm, prediction with expert advice model
Lecture 3 (Aug 9). The multiplicative weights/exponential weights algorithm for online learning with convex losses, regret bound, randomized multiplicative weights
Lecture 4 (Aug 14). Doubling trick, regret bounds holding with high probability
Lecture 5 (Aug 16). Minimax regret for prediction with experts
Lecture 6 (Aug 21). Sequential investment and the universal portfolio algorithm
Lecture 7 (Aug 23). Online convex optimization, Follow The Leader (FTL) algorithm
Lecture 8 (Aug 28). Follow the Regularized Leader (FTRL) algorithm
Lecture 9 (Aug 30). Choosing a good regularizer — FTRL for prediction with experts with Euclidean and entropic regularizers
Lecture 10 (Sep 4). Online Mirror Descent algorithms
Lecture 11 (Sep 6). Online Mirror Descent algorithms, contd.
Lecture 12 (Sep 11). Multi-armed bandits, the EXP3 algorithm
Lecture 13 (Sep 18). Contextual bandits, bandits with expert advice and the EXP4 algorithm
Lecture 14 (Sep 20). Bandit convex optimization, bandit gradient descent
Lecture 15 (Sep 25). Stochastic bandit, Upper Confidence Bound (UCB) algorithm
Lecture 16 (Sep 27). UCB regret bound, Thompson sampling
Lecture 17 (Oct 4). Thompson sampling regret bound
Lecture 18 (Oct 9). Pure exploration in multi-armed bandits
Lecture 19 (Oct 11). Pure exploration in multi-armed bandits
Lecture 20 (Oct 16). Fundamental performance limits of bandit algorithms — change-of-measure approach
Lecture 21 (Oct 18). Fundamental performance limits of bandit algorithms — change-of-measure approach
Lecture 22 (Oct 23). Stochastic linear bandits, LinUCB algorithm
Lecture 23 (Oct 25). Markov Decision Processes
Lecture 24 (Nov 1). Optimal planning for MDPs — value iteration
Lecture 25 (Nov 6). Optimal planning for MDPs — policy iteration
Lecture 26 (Nov 13). Reinforcement learning (RL) — value estimation methods - Monte Carlo, temporal difference (TD)
Lecture 27 (Nov 15). Reinforcement learning (RL) — value estimation methods - Monte Carlo, temporal difference (TD)
Lecture 28 (Nov 20). Reinforcement learning (RL) — model-free control - Q-learning, SARSA-based control
Lecture 29 (Nov 22). Reinforcement learning (RL) — model-based RL - regret minimization, upper confidence RL (UCRL)

Homework:

Homework 1 posted Aug 10, due Aug 23
Homework 2 posted Aug 23, due Sep 6
Homework 3 posted Sep 13, due Sep 27

Final Exam: 30 Oct 2018 [Final exam]

Last updated: 27-Jan-2026, 11:11:52 IST