Fall 2018 (AugDec)
Online Prediction and Learning
Course No.: E1 245
Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, Email: firstname AT iisc.ac.in
Time: TTh 14001530
Place: ECE 1.07
Course Description: The ability to make continual and accurate forecasts and decisions under uncertainty is key in many of today’s datadriven intelligent systems (e.g. Internet recommendation engines, automated trading, etc.). This elective course is aimed to expose students to techniques for online or sequential learning/decision making under uncertainty. We will explore several frameworks and algorithms for online prediction, along with rigorous analyses of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), sequential data compression (information theory), etc.
Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Bandits; Applications sequential investment/portfolio selection, universal lossless data compression, Stochastic games Blackwell approachability, Learning systems with state online reinforcement learning
Prerequisites: A basic course in probability or stochastic processes. Exposure to convexity (convex geometry, convex analysis or convex optimization) will be helpful but is not absolutely necessary. Contact the instructor for clarifications.
Grading policy: Homework assignments — 30%, exam — 30%, miniproject — 30%, lecture scribing — 10%
Text/References: There is no single reference text for the course. However, a rough basis will be "Prediction, Learning and Games" (PLG). Nicolo CesaBianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].
Other useful resources that will be followed as appropriate:
Miniproject:
Description. The miniproject carries 30% of course credit, and can comprise either one of the following: (a) a literature survey of a subtopic/area of online learning featuring recent results/stateoftheart developments (about 23 papers' worth of material, see list of topics below), (b) implementation and benchmarking of known algorithms applied to interesting and new problem domains (run on realworld data or environments if possible), (c) original research on an idea that you may propose. The student will have to make a presentation to the class about his/her project at the end of the term (schedule TBD), of about 3040 min. Please discuss your proposed miniproject with the instructor in advance, and submit a short proposal (1 page max.) for what you would like to do — this must include a structured plan for execution.
Timeline. Deadline for submitting project proposal: Oct 1. Project presentations: Between Nov 2229 (exact dates TBD).
List of potential research papers for project ideas (will be continuously updated) — feel free to pick among these, or to pursue other topics that excite or intrigue you. Generally, good venues to look for highquality work are the proceedings of the ICML, NIPS, COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years, and the JMLR and MLJ journals.
Scribing:
Students are expected to scribe lectures in pairs, using the LaTeX typesetting system. The goal is to post highquality scribed notes within a week of the lecture. Please send me a draft of the scribe notes (with LaTeX source) within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
Lectures:

Lecture 1 (Aug 2). Introduction to online learning, binary sequence prediction, the Halving algorithm, mistake bounds

Lecture 2 (Aug 7). The Weighted Majority algorithm, prediction with expert advice model

Lecture 3 (Aug 9). The multiplicative weights/exponential weights algorithm for online learning with convex losses, regret bound, randomized multiplicative weights

Lecture 4 (Aug 14). Doubling trick, regret bounds holding with high probability

Lecture 5 (Aug 16). Minimax regret for prediction with experts

Lecture 6 (Aug 21). Sequential investment and the universal portfolio algorithm

Lecture 7 (Aug 23). Online convex optimization, Follow The Leader (FTL) algorithm

Lecture 8 (Aug 28). Follow the Regularized Leader (FTRL) algorithm

Lecture 9 (Aug 30). Choosing a good regularizer — FTRL for prediction with experts with Euclidean and entropic regularizers

Lecture 10 (Sep 4). Online Mirror Descent algorithms

Lecture 11 (Sep 6). Online Mirror Descent algorithms, contd.

Lecture 12 (Sep 11). Multiarmed bandits, the EXP3 algorithm

Lecture 13 (Sep 18). Contextual bandits, bandits with expert advice and the EXP4 algorithm

Lecture 14 (Sep 20). Bandit convex optimization, bandit gradient descent

Lecture 15 (Sep 25). Stochastic bandit, Upper Confidence Bound (UCB) algorithm

Lecture 16 (Sep 27). UCB regret bound, Thompson sampling

Lecture 17 (Oct 4). Thompson sampling regret bound

Lecture 18 (Oct 9). Pure exploration in multiarmed bandits

Lecture 19 (Oct 11). Pure exploration in multiarmed bandits

Lecture 20 (Oct 16). Fundamental performance limits of bandit algorithms — changeofmeasure approach

Lecture 21 (Oct 18). Fundamental performance limits of bandit algorithms — changeofmeasure approach

Lecture 22 (Oct 23). Stochastic linear bandits, LinUCB algorithm

Lecture 23 (Oct 25). Markov Decision Processes

Lecture 24 (Nov 1). Optimal planning for MDPs — value iteration

Lecture 25 (Nov 6). Optimal planning for MDPs — policy iteration

Lecture 26 (Nov 13). Reinforcement learning (RL) — value estimation methods  Monte Carlo, temporal difference (TD)

Lecture 27 (Nov 15). Reinforcement learning (RL) — value estimation methods  Monte Carlo, temporal difference (TD)

Lecture 28 (Nov 20). Reinforcement learning (RL) — modelfree control  Qlearning, SARSAbased control

Lecture 29 (Nov 22). Reinforcement learning (RL) — modelbased RL  regret minimization, upper confidence RL (UCRL)
Homework:

Homework 1 posted Aug 10, due Aug 23

Homework 2 posted Aug 23, due Sep 6

Homework 3 posted Sep 13, due Sep 27
Last updated: 23Apr2019, 12:57:52 IST