Aditya Gopalan’s Homepage: Teaching

Fall 2015 (Aug-Dec)

Online Prediction and Learning

Course No.: E1 245

Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, E-mail: first-name AT ece.iisc.ernet.in

Time: TTh 17:30-19:00

Place: ECE 1.07

Course Description: The ability to make continual and accurate forecasts is key in many of today’s data-driven intelligent systems (e.g. Internet recommendation engines, automated trading, etc.). This elective course is aimed to expose students to techniques for online or sequential learning/decision making under uncertainty. We will explore several frameworks and algorithms for online prediction along with a rigorous understanding of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), data compression (information theory), etc.

Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Bandits; Applications- sequential investment/portfolio selection, universal lossless data compression, Stochastic games- Blackwell approachability, Learning systems with state- online reinforcement learning

Prerequisites: Probability/stochastic processes, linear algebra, some exposure to convexity/convex optimization. Contact the instructor for clarifications.

Text/References: We will not follow any standard textbook(s). However, a rough basis will be the excellent and encyclopaedic "Prediction, Learning and Games" (PLG). Nicolo Cesa-Bianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].

Other useful resources:

Online Learning course by Bartok et al (GB)
Online Learning course by Abernethy (JA)
Online Learning course by Bubeck (SB)
Online Learning notes by Shai-Shalev Shwartz (SSS)
Online Learning lecture notes by Alexander Rakhlin (AR)
Bandits survey by Bubeck & Cesa-Bianchi (BCB)

Homework:

Homework 1 posted Aug 17, due Sep 1
Homework 2 posted Sep 15, due Oct 1
Homework 3 posted Nov 3, due Dec 9, EXP3-IX paper by Neu, 2015, dataset for Problem 5

Mini-project:

The mini-project carries 15% of course credit, and can comprise either (a) a literature survey of a sub-topic/area of online learning featuring recent results/state-of-the-art developments (about 2 papers' worth of material, see list of topics below), (b) implementation and benchmarking of existing online learning algorithms applied to interesting and new problem domains (with real-world data, for instance), (c) work on a novel research idea that you may have developed, or (d) any other idea for about a month-long venture that you may have. The student will have to make a small presentation to the class about his/her project at the end of the term (schedule TBD), of about 30-40 min. Please discuss your proposed mini-project with the instructor in advance.

List of potential papers for surveys (will be continuously updated) — feel free to pick among these, or to study other topics that excite you. Generally, good venues to look for high-quality work are the proceedings of the ICML, NIPS, COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years.

Scribing: Each student is expected to scribe 2-3 lectures in LaTeX. The goal is to post high-quality scribed notes (below) within a week of the lecture. Please send me a draft of the scribe notes (with LaTeX source) within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

Lectures:

Lecture 1 (Aug 4). Introduction to online learning, Motivation, Examples, Teaser: Batch estimation vs online learning: (a) estimating the bias of a coin by observing (a batch of) iid samples, (b) predicting the next outcome using (a batch of) iid samples, (c) predicting the next outcome of an arbitrary 0-1 sequence
Lecture 2 (Aug 6). Math fundamentals review. Probability spaces, events, random variables, expectation, conditional expectation, variance, independendence, the Strong Law of Large Numbers, the Central Limit theorem, Probability inequalities - Markov’s inequality, Chebyshev’s inequality, Chernoff-Hoeffding bound
Lecture 3 (Aug 11). Math fundamentals review (contd.) - Bernstein’s inequality, Convexity, Convex functions, Strongly convex functions. 1-bit sequence prediction with expert advice, the Halving/Majority algorithm and mistake bound, the Weighted Majority algorithm for 1-bit prediction with 0-1 loss.
Lecture 4 (Aug 13). Mistake bound for the Weighted Majority algorithm, General Prediction with Experts problem, Regret (worst case) for an online learning algorithm
Lecture 5 (Aug 18). Prediction with expert advice and convex loss functions, the Exponential Weights/Multiplicative Weights/Hedge algorithm and regret bound. Reference(s): PLG Chapter 2
Lecture 6 (Aug 20). Inevitability of suffering linear regret in the prediction game with general (nonconvex) losses, overcoming it by randomization, the Randomized Exponential Weights algorithm, expected and tail bounds on regret. Reference(s): GB Chapter 4, PLG Chapter 4
Lecture 7 (Aug 25). Regret bound holding with high probability (tail bound on regret) for the Randomized Exponential Weights algorithm. Minimax regret for prediction with experts and convex experts. Reference(s): GB Chapter 5, PLG Thm. 3.7
Lecture 8 (Aug 27). Tracking or Competing with Shifting Experts. Tracking regret, the Exponential Weights forecaster for classes of switching experts. Reference(s): GB Chapter 6, PLG Chapter 5
Lecture 9 (Sept 1). Tracking regret in the actions game (continued) - Regret of the Exponentially Weighted forecaster against static switching experts that switch a bounded no. of times, Towards an efficient algorithm for low regret against switching experts, Exp-Wts regret with arbitrary (not necessarily) uniform initial weights, the Fixed Share Forecaster. Reference(s): GB Chapter 6, PLG Chapter 5
Lecture 10 (Sept 3). Tracking regret in the actions game (continued) - Regret analysis for the Fixed Share Forecaster. Reference(s): GB Chapter 6, PLG Chapter 5
Lecture 11 (Sept 8). Improved regret in the Experts game with exp-concave losses for the Exponentially Weighted forecaster, Intro. to the Sequential investment problem in an arbitrary market, Buy-and-Hold portfolios, Constantly Rebalancing Portfolios (CRPs). Reference(s): PLG Chapter 10, JA Lectures 11-13
Lecture 12 (Sept 10). Cover’s Universal Portfolio algorithm and its regret with respect to CRPs. Reference(s): PLG Chapter 10, JA Lectures 11-13
Lecture 13 (Sept 15). Intro. to Online Convex Optimization, the Follow-The-Leader (FTL) strategy and its regret. Reference(s): SSS Chapter 2, GB Chapter 8
Lecture 14 (Sept 22). Online Convex Optimization, the Follow-The-Regularized-Leader (FTRL) strategy. Reference(s): SSS Chapter 2, GB Chapter 8
Lecture 15 (Sept 24). Online Convex Optimization - FTRL regret analysis, FTRL regret with strongly convex regularizers & Lipschitz-continuous losses, why the entropic regularizer (Exp-Wts) is better than the Euclidean regularizer for the experts game. Reference(s): SSS Chapter 2, GB Chapter 8
Lecture 16 (Sept 29). Online Convex Optimization - FTRL and the Online Mirror Descent (OMD) framework, regret bounds via duality interpretation of FTRL, Fenchel dual. Reference(s): SSS Chapter 2, GB Chapter 8, Fenchel duality ref. 1, Fenchel duality ref. 2
Lecture 17 (Oct 1). Online Mirror Descent - Bregman divergences and Fenchel duals of Legendre functions. Reference(s): SSS Chapter 2, GB Chapter 8, AR Chapter 2
Lecture 18 (Oct 6). Online Mirror Descent algorithm - Lazy and Active versions. OMD regret bound using Bregman divergences. Intro. to (nonstochastic) Multi-armed Bandits. Reference(s): SSS Chapter 2, GB Chapters 8, 9, 12 & 15, AR Chapter 2
Lecture 19 (Oct 8). Nonstochastic bandits: the EXP3 algorithm and its regret. Reference(s): GB Chapter 15, JA lecture 20
Lecture 20 (Oct 13). Nonstochastic bandits: the EXP3-IX (EXP3 with Implicit eXploration) algorithm and a bound on the tail of the regret (i.e., regret with high probability). Reference(s): Neu, 2015
Lecture 21 (Oct 15). Nonstochastic bandits: Fundamental lower bound on regret for any bandit algorithm. Reference(s): BCB Sec. 3.3
Lecture 22 (Oct 20). Nonstochastic bandits: Fundamental lower bound on regret for any bandit algorithm (continued). Reference(s): BCB Sec. 3.3
Lecture 23 (Oct 27). Stochastic bandits: the Upper Confidence Bound (UCB) algorithm. Reference(s): BCB Chap. 2, Auer et al 2002 UCB paper
Lecture 24 (Oct 29). Stochastic bandits: UCB regret, Thompson sampling algorithm. Reference(s): BCB Chap. 2, Auer et al 2002 UCB paper
Lecture 25 (Nov 3). Stochastic bandits: Thompson sampling regret bound (2 arms). Reference(s): Agrawal-Goyal 2012
Lecture 26 (Nov 5). Stochastic bandits: Thompson sampling regret bound (2 arms, continued). Reference(s): Agrawal-Goyal 2012
Lecture 27 (Nov 10). The best-arm identification/Pure exploration problem in bandits, PAC sample complexity, the Median Elimination algorithm. Reference(s): Even-Dar-Mannor-Mansour '06 paper on Median Elimination
Lecture 28 (Nov 12). PAC sample complexity of the Median Elimination algorithm. Reference(s): Even-Dar-Mannor-Mansour '06 paper on Median Elimination, Mannor-Tsitsiklis '04 — lower bound on sample complexity of PAC bandit arm identification, paper by Bubeck et al, 2009 showing that regret-optimal strategies cannot induce PAC-optimal algorithms

Final Exam:

December 2, 2015, 2pm-5pm, ECE 1.07

[Final exam] [Solutions]

Last updated: 14-Jul-2025, 12:29:32 IST