Fall 2014 (Aug-Dec)
Online Prediction and Learning
Course No.: E1 245 (AUG)
Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, E-mail: first-name AT ece.iisc.ernet.in
Time: TTh 9:30-11:00 (first class: Aug 5, 2014)
Place: ECE 1.07
Office Hours: TTh 15:00-16:00, ECE 2.09
Course Description: The ability to make continual and accurate forecasts is key in many of today’s data-driven intelligent systems. This 3-credit elective course is aimed to expose students to techniques for sequential learning and decision making under uncertainty. We will explore several frameworks and algorithms for online learning along with a rigorous understanding of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), data compression (information theory), etc.
Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Learning under bandit feedback; Calibration; Applications- sequential investment/portfolio selection, universal lossless data compression, Stochastic games- Blackwell approachability, Learning systems with state- Markov Decision Processes, online reinforcement learning
Prerequisites: Basic probability/stochastic processes, linear algebra, familiarity with convexity & mathematical fluency. Contact the instructor for clarifications.
Text: We will not follow any standard textbook(s). However, a rough basis will be "Prediction, Learning and Games" ("PLG"). Nicolo Cesa-Bianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy]
Homework:
-
Homework 1 posted Aug 18, due Sep 2; Homework 1 solutions
-
Homework 2 posted Sep 23, due Oct 7; Homework 2 solutions
-
Homework 3 posted Nov 2, due Nov 20; Homework 3 solutions
-
Homework 4 posted Nov 26, due Dec 10; Homework 4 solutions
Final Exam: December 9, 2014
Scribing: Each student is expected to scribe 2 lectures in LaTeX (worth 20% of the grade). The goal is to post high-quality scribed notes within a week of the lecture. Please send me a draft of the scribe notes within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
Lectures:
-
Lecture 1 (Aug 5). Introduction to online learning, Motivation, Examples, Prediction with expert advice, 1-bit prediction and the Majority algorithm, mistake bound in the presence of an omniscient expert
-
Lecture 2 (Aug 7). Getting rid of the omniscient expert/perfect oracle assumption - the Weighted Majority algorithm and mistake bound, Introduction to general model of prediction with expert advice - general decision & outcome spaces and loss functions, regret minimization
-
Lecture 3 (Aug 12). Convexity review, The Exponential Weights algorithm for convex decision spaces and convex losses (over decisions), regret bound, Negative example for 1-bit prediction and 0-1 loss: Impossible to always achieve sublinear regret with deterministic learning algorithms, Need for randomization in playing actions
Resources: PLG Chapter 2, Arora-Hazan-Kale survey on the Multiplicative Weights method. -
Lecture 4 (Aug 14). Actions setting, Randomized version of Exponential Weights over finite action set, Expected regret bounds, Azuma-Hoeffding to get regret bounds with high probability, Exp-Wts regret bounds for exp-concave losses
Resources: PLG Chapter 4 -
Lecture 5 (Aug 19). Minimax regret for predicting with experts and convex losses, Intro to sequential probability estimation with the log loss, connection to (lossless) sequential source coding/compression, equivalence of regret minimization and pointwise redundancy of a coding scheme
Resources: PLG Chap 2 (Minimax regret), PLG Thm 3.7 (Minimax regret for the absolute loss), PLG Chap 9 (Log loss and sequential data compression), Lecture notes by Aarti Singh @ CMU, Lectures 6-11 give a clear explanation of source coding, Shannon-Fano and arithmetic codes, leading up to universal sequential compression and expected and worst-case redundancy, PPT detailing arithmetic coding and sequential estimation -
Lecture 6 (Aug 21). Minimax regret of sequential probability estimation & Normalized Max-Likelihood, Exp-Wts for iid experts: the Laplace mixture forecaster/add-1 rule, regret bound
Resources: PLG Chap 9, Csiszár-Shields tutorial, Sec 6: max-redundancy bounds for various classes of distributions (iid, Markov, etc), -
Lecture 7 (Aug 26). Sequential investment in arbitrary markets, Buy-and-Hold strategies & Constantly Rebalancing Portfolios, Cover’s Universal Portfolio algorithm, regret bound, Intro to Online convex optimization
Resources: PLG Chap 10, Original Universal Portfolios paper, Portfolio theory for stochastic (ergodic) markets -
Lecture 8 (Aug 28). Online convex optimization framework, Follow-The-Leader (FTL), Regularized FTL (FTRL)
Resources: Online learning notes by Shai Shalev-Shwartz (Chap 2), Elad Hazan’s survey on FTRL for regret minimization -
Lecture 9 (Sep 2). Generic regret bound for FTRL, Regret bounds for Online Gradient Descent (OGD) and Projected OGD (POGD), POGD gives suboptimal regret scaling when applied to the expert selection problem, Intro to strong convexity of loss functions
Resources: Online learning notes by Sebastien Bubeck (Chap 4), Online learning notes by Shai Shalev-Shwartz (Chap 2) -
Lecture 10 (Sep 4). OGD with strongly convex losses: logarithmic regret with time-varying learning rate, Impact of regularizer on FTRL performance: FTRL regret bound with Lipschitz losses + strongly convex regularizer, application: recovering optimal regret for prediction with experts and Exp-Weights
Resources: Online learning notes by Sebastien Bubeck (Chap 4), Hazan-Agarwal-Kale paper on Logarithmic Regret OCO algorithms, Online learning notes by Shai Shalev-Shwartz (Chap 2) -
Lecture 11 (Sep 9). Alternative view of FTRL: Online Mirror Descent (OMD) framework, OMD as gradient descent in the dual space, Fenchel duality, Bregman divergences, Active and Lazy projected variants of OMD, regret for Active OMD in terms of Bregman divergences
Resources: Online learning notes by Sebastien Bubeck (Chap 5), Online learning notes by Shai Shalev-Shwartz (Chap 2), Sasha Rakhlin’s notes on FTRL (Chap 2), short tutorial on the Fenchel conjugate -
Lecture 12 (Sep 11). Partial Information: Bandit feedback, adversarial bandit model, the EXP3 algorithm and regret bound
Resources: J. Abernethy’s lecture notes, Bubeck-CesaBianchi survey on bandits -
Lecture 13 (Sep 16). Adversarial bandits - the EXP3.P algorithm, regret performance holding with high probability
Resources: Bubeck-CesaBianchi survey on bandits, Auer-CesaBianchi-Freund-Schapire EXP3 paper -
Lecture 14 (Sep 18). Minimax lower bound on bandit regret (adversarial)
Resources: Bubeck-CesaBianchi survey on bandits (Thm. 3.5), Auer-CesaBianchi-Freund-Schapire EXP3 paper, PLG book (Sec 6.9) -
Lecture 15 (Sep 23). Contextual bandits (adversarial) and the EXP4 algorithm
Resources: Bubeck-CesaBianchi survey on bandits (Chap 4). -
Lecture 16 (Sep 25). EXP4 regret, Stochastic bandits (iid), futility of Follow-The-Leader, epsilon-First/epsilon-Greedy algorithm, Upper Confidence Bound (UCB) algorithm
Resources: Auer-CesaBianchi-Fischer 2002 paper on UCB. -
Lecture 17 (Sep 30). UCB regret, the Pure Exploration problem in bandits, the Median Elimination algorithm for PAC arm selection, sample complexity of Median Elimination
Resources: Median Elimination paper by EvenDar-Mannor-Mansour, 2006 -
Lecture 18 (Oct 7). Thompson Sampling algorithm, Regret analysis for 2-armed Bernoulli rewards
Resources: TS regret analysis by Agrawal-Goyal, 2011 -
Lecture 19 (Oct 9). Thompson Sampling regret analysis for 2-armed Bernoulli rewards (contd.)
Resources: TS regret analysis by Agrawal-Goyal, 2011 -
Lecture 20 (Oct 14). Online decision making with a large number of decisions: the Follow the Perturbed Leader (FPL) algorithm
Resources: Kalai-Vempala paper, '05 -
Lecture 21 (Oct 16). FPL analysis and regret bounds
Resources: Kleinberg lecture notes on FPL, Kalai-Vempala paper, '05 -
Lecture 22 (Oct 28). Online Convex Optimization with Bandit Feedback - Bandit Gradient Descent with one-point gradient estimates
Resources: Flaxman-Kalai-McMahan paper -
Lecture 23 (Oct 30). Online Convex Optimization with Bandit Feedback - Bandit Gradient Descent with one-point gradient estimates (contd.)
-
Lecture 24 (Nov 11). Online learning with state: Introduction to Planning & Reinforcement Learning (RL)
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) -
Lecture 25 (Nov 13). Planning: Finite Horizon Dynamic Programming
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) -
Lecture 26 (Nov 18). Planning: Infinite Horizon Dynamic Programming
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) -
Lecture 27 (Nov 20). (Model-free) RL — Actor-critic framework, Policy evaluatio - Monte Carlo, Temporal Difference methods - TD(0), SARSA; Policy improvement
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) -
Lecture 28 (Nov 25). Q-learning, Introduction to Stochastic Approximation
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) -
Lecture 29 (Nov 27). Convergence of Stochastic Approximation, Application: convergence of Q-learning, TD(0), etc.
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion)
Last updated: 29-Dec-2024, 15:36:27 IST