Fall 2014 (AugDec)
Online Prediction and Learning
Course No.: E1 245 (AUG)
Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, Email: firstname AT ece.iisc.ernet.in
Time: TTh 9:3011:00 (first class: Aug 5, 2014)
Place: ECE 1.07
Office Hours: TTh 15:0016:00, ECE 2.09
Course Description: The ability to make continual and accurate forecasts is key in many of today’s datadriven intelligent systems. This 3credit elective course is aimed to expose students to techniques for sequential learning and decision making under uncertainty. We will explore several frameworks and algorithms for online learning along with a rigorous understanding of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), data compression (information theory), etc.
Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Learning under bandit feedback; Calibration; Applications sequential investment/portfolio selection, universal lossless data compression, Stochastic games Blackwell approachability, Learning systems with state Markov Decision Processes, online reinforcement learning
Prerequisites: Basic probability/stochastic processes, linear algebra, familiarity with convexity & mathematical fluency. Contact the instructor for clarifications.
Text: We will not follow any standard textbook(s). However, a rough basis will be "Prediction, Learning and Games" ("PLG"). Nicolo CesaBianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy]
Homework:

Homework 1 posted Aug 18, due Sep 2; Homework 1 solutions

Homework 2 posted Sep 23, due Oct 7; Homework 2 solutions

Homework 3 posted Nov 2, due Nov 20; Homework 3 solutions

Homework 4 posted Nov 26, due Dec 10; Homework 4 solutions
Final Exam: December 9, 2014
Scribing: Each student is expected to scribe 2 lectures in LaTeX (worth 20% of the grade). The goal is to post highquality scribed notes within a week of the lecture. Please send me a draft of the scribe notes within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.
Lectures:

Lecture 1 (Aug 5). Introduction to online learning, Motivation, Examples, Prediction with expert advice, 1bit prediction and the Majority algorithm, mistake bound in the presence of an omniscient expert

Lecture 2 (Aug 7). Getting rid of the omniscient expert/perfect oracle assumption  the Weighted Majority algorithm and mistake bound, Introduction to general model of prediction with expert advice  general decision & outcome spaces and loss functions, regret minimization

Lecture 3 (Aug 12). Convexity review, The Exponential Weights algorithm for convex decision spaces and convex losses (over decisions), regret bound, Negative example for 1bit prediction and 01 loss: Impossible to always achieve sublinear regret with deterministic learning algorithms, Need for randomization in playing actions
Resources: PLG Chapter 2, AroraHazanKale survey on the Multiplicative Weights method. 
Lecture 4 (Aug 14). Actions setting, Randomized version of Exponential Weights over finite action set, Expected regret bounds, AzumaHoeffding to get regret bounds with high probability, ExpWts regret bounds for expconcave losses
Resources: PLG Chapter 4 
Lecture 5 (Aug 19). Minimax regret for predicting with experts and convex losses, Intro to sequential probability estimation with the log loss, connection to (lossless) sequential source coding/compression, equivalence of regret minimization and pointwise redundancy of a coding scheme
Resources: PLG Chap 2 (Minimax regret), PLG Thm 3.7 (Minimax regret for the absolute loss), PLG Chap 9 (Log loss and sequential data compression), Lecture notes by Aarti Singh @ CMU, Lectures 611 give a clear explanation of source coding, ShannonFano and arithmetic codes, leading up to universal sequential compression and expected and worstcase redundancy, PPT detailing arithmetic coding and sequential estimation 
Lecture 6 (Aug 21). Minimax regret of sequential probability estimation & Normalized MaxLikelihood, ExpWts for iid experts: the Laplace mixture forecaster/add1 rule, regret bound
Resources: PLG Chap 9, CsiszárShields tutorial, Sec 6: maxredundancy bounds for various classes of distributions (iid, Markov, etc), 
Lecture 7 (Aug 26). Sequential investment in arbitrary markets, BuyandHold strategies & Constantly Rebalancing Portfolios, Cover’s Universal Portfolio algorithm, regret bound, Intro to Online convex optimization
Resources: PLG Chap 10, Original Universal Portfolios paper, Portfolio theory for stochastic (ergodic) markets 
Lecture 8 (Aug 28). Online convex optimization framework, FollowTheLeader (FTL), Regularized FTL (FTRL)
Resources: Online learning notes by Shai ShalevShwartz (Chap 2), Elad Hazan’s survey on FTRL for regret minimization 
Lecture 9 (Sep 2). Generic regret bound for FTRL, Regret bounds for Online Gradient Descent (OGD) and Projected OGD (POGD), POGD gives suboptimal regret scaling when applied to the expert selection problem, Intro to strong convexity of loss functions
Resources: Online learning notes by Sebastien Bubeck (Chap 4), Online learning notes by Shai ShalevShwartz (Chap 2) 
Lecture 10 (Sep 4). OGD with strongly convex losses: logarithmic regret with timevarying learning rate, Impact of regularizer on FTRL performance: FTRL regret bound with Lipschitz losses + strongly convex regularizer, application: recovering optimal regret for prediction with experts and ExpWeights
Resources: Online learning notes by Sebastien Bubeck (Chap 4), HazanAgarwalKale paper on Logarithmic Regret OCO algorithms, Online learning notes by Shai ShalevShwartz (Chap 2) 
Lecture 11 (Sep 9). Alternative view of FTRL: Online Mirror Descent (OMD) framework, OMD as gradient descent in the dual space, Fenchel duality, Bregman divergences, Active and Lazy projected variants of OMD, regret for Active OMD in terms of Bregman divergences
Resources: Online learning notes by Sebastien Bubeck (Chap 5), Online learning notes by Shai ShalevShwartz (Chap 2), Sasha Rakhlin’s notes on FTRL (Chap 2), short tutorial on the Fenchel conjugate 
Lecture 12 (Sep 11). Partial Information: Bandit feedback, adversarial bandit model, the EXP3 algorithm and regret bound
Resources: J. Abernethy’s lecture notes, BubeckCesaBianchi survey on bandits 
Lecture 13 (Sep 16). Adversarial bandits  the EXP3.P algorithm, regret performance holding with high probability
Resources: BubeckCesaBianchi survey on bandits, AuerCesaBianchiFreundSchapire EXP3 paper 
Lecture 14 (Sep 18). Minimax lower bound on bandit regret (adversarial)
Resources: BubeckCesaBianchi survey on bandits (Thm. 3.5), AuerCesaBianchiFreundSchapire EXP3 paper, PLG book (Sec 6.9) 
Lecture 15 (Sep 23). Contextual bandits (adversarial) and the EXP4 algorithm
Resources: BubeckCesaBianchi survey on bandits (Chap 4). 
Lecture 16 (Sep 25). EXP4 regret, Stochastic bandits (iid), futility of FollowTheLeader, epsilonFirst/epsilonGreedy algorithm, Upper Confidence Bound (UCB) algorithm
Resources: AuerCesaBianchiFischer 2002 paper on UCB. 
Lecture 17 (Sep 30). UCB regret, the Pure Exploration problem in bandits, the Median Elimination algorithm for PAC arm selection, sample complexity of Median Elimination
Resources: Median Elimination paper by EvenDarMannorMansour, 2006 
Lecture 18 (Oct 7). Thompson Sampling algorithm, Regret analysis for 2armed Bernoulli rewards
Resources: TS regret analysis by AgrawalGoyal, 2011 
Lecture 19 (Oct 9). Thompson Sampling regret analysis for 2armed Bernoulli rewards (contd.)
Resources: TS regret analysis by AgrawalGoyal, 2011 
Lecture 20 (Oct 14). Online decision making with a large number of decisions: the Follow the Perturbed Leader (FPL) algorithm
Resources: KalaiVempala paper, '05 
Lecture 21 (Oct 16). FPL analysis and regret bounds
Resources: Kleinberg lecture notes on FPL, KalaiVempala paper, '05 
Lecture 22 (Oct 28). Online Convex Optimization with Bandit Feedback  Bandit Gradient Descent with onepoint gradient estimates
Resources: FlaxmanKalaiMcMahan paper 
Lecture 23 (Oct 30). Online Convex Optimization with Bandit Feedback  Bandit Gradient Descent with onepoint gradient estimates (contd.)

Lecture 24 (Nov 11). Online learning with state: Introduction to Planning & Reinforcement Learning (RL)
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) 
Lecture 25 (Nov 13). Planning: Finite Horizon Dynamic Programming
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) 
Lecture 26 (Nov 18). Planning: Infinite Horizon Dynamic Programming
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) 
Lecture 27 (Nov 20). (Modelfree) RL — Actorcritic framework, Policy evaluatio  Monte Carlo, Temporal Difference methods  TD(0), SARSA; Policy improvement
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) 
Lecture 28 (Nov 25). Qlearning, Introduction to Stochastic Approximation
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion) 
Lecture 29 (Nov 27). Convergence of Stochastic Approximation, Application: convergence of Qlearning, TD(0), etc.
Resources: Lecture notes (courtesy of Nahum Shimkin, Technion)
Last updated: 07Jan2022, 18:33:24 IST