Fall 2018 (Aug-Dec)

Online Prediction and Learning

Course No.: E1 245

Instructor: Aditya Gopalan, ECE 2.09, Dept. of ECE, E-mail: first-name AT iisc.ac.in

Time: TTh 1400-1530

Place: ECE 1.07

Course Description: The ability to make continual and accurate forecasts and decisions under uncertainty is key in many of today’s data-driven intelligent systems (e.g. Internet recommendation engines, automated trading, etc.). This elective course is aimed to expose students to techniques for online or sequential learning/decision making under uncertainty. We will explore several frameworks and algorithms for online prediction, along with rigorous analyses of their performance. We will also look at interesting applications of these techniques, such as portfolio optimization (finance), sequential data compression (information theory), etc.

Contents: Online classification; Regret Minimization; Learning with experts; Online convex optimization; Bandits; Applications- sequential investment/portfolio selection, universal lossless data compression, Stochastic games- Blackwell approachability, Learning systems with state- online reinforcement learning

Prerequisites: A basic course in probability or stochastic processes. Exposure to convexity (convex geometry, convex analysis or convex optimization) will be helpful but is not absolutely necessary. Contact the instructor for clarifications.

Grading policy: Homework assignments — 30%, exam — 30%, mini-project — 30%, lecture scribing — 10%

Text/References: There is no single reference text for the course. However, a rough basis will be "Prediction, Learning and Games" (PLG). Nicolo Cesa-Bianchi and Gabor Lugosi, Cambridge University Press, 2006 [local PDF copy].

Other useful resources that will be followed as appropriate:

Mini-project:

Description. The mini-project carries 30% of course credit, and can comprise either one of the following: (a) a literature survey of a sub-topic/area of online learning featuring recent results/state-of-the-art developments (about 2-3 papers' worth of material, see list of topics below), (b) implementation and benchmarking of known algorithms applied to interesting and new problem domains (run on real-world data or environments if possible), (c) original research on an idea that you may propose. The student will have to make a presentation to the class about his/her project at the end of the term (schedule TBD), of about 30-40 min. Please discuss your proposed mini-project with the instructor in advance, and submit a short proposal (1 page max.) for what you would like to do — this must include a structured plan for execution.

Timeline. Deadline for submitting project proposal: Oct 1. Project presentations: Between Nov 22-29 (exact dates TBD).

List of potential research papers for project ideas (will be continuously updated) — feel free to pick among these, or to pursue other topics that excite or intrigue you. Generally, good venues to look for high-quality work are the proceedings of the ICML, NIPS, COLT and AISTATS machine learning conferences (and their associated workshops) in the past few years, and the JMLR and MLJ journals.

Scribing:

Students are expected to scribe lectures in pairs, using the LaTeX typesetting system. The goal is to post high-quality scribed notes within a week of the lecture. Please send me a draft of the scribe notes (with LaTeX source) within 4 days of each lecture so that we can meet this goal by iterating if necessary. Please use this LaTeX template for scribe notes. Cite clearly all references that you use.

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

Lectures:

  • Lecture 1 (Aug 2). Introduction to online learning, binary sequence prediction, the Halving algorithm, mistake bounds

  • Lecture 2 (Aug 7). The Weighted Majority algorithm, prediction with expert advice model

  • Lecture 3 (Aug 9). The multiplicative weights/exponential weights algorithm for online learning with convex losses, regret bound, randomized multiplicative weights

  • Lecture 4 (Aug 14). Doubling trick, regret bounds holding with high probability

  • Lecture 5 (Aug 16). Minimax regret for prediction with experts

  • Lecture 6 (Aug 21). Sequential investment and the universal portfolio algorithm

  • Lecture 7 (Aug 23). Online convex optimization, Follow The Leader (FTL) algorithm

  • Lecture 8 (Aug 28). Follow the Regularized Leader (FTRL) algorithm

  • Lecture 9 (Aug 30). Choosing a good regularizer — FTRL for prediction with experts with Euclidean and entropic regularizers

  • Lecture 10 (Sep 4). Online Mirror Descent algorithms

  • Lecture 11 (Sep 6). Online Mirror Descent algorithms, contd.

  • Lecture 12 (Sep 11). Multi-armed bandits, the EXP3 algorithm

  • Lecture 13 (Sep 18). Contextual bandits, bandits with expert advice and the EXP4 algorithm

  • Lecture 14 (Sep 20). Bandit convex optimization, bandit gradient descent

  • Lecture 15 (Sep 25). Stochastic bandit, Upper Confidence Bound (UCB) algorithm

  • Lecture 16 (Sep 27). UCB regret bound, Thompson sampling

  • Lecture 17 (Oct 4). Thompson sampling regret bound

  • Lecture 18 (Oct 9). Pure exploration in multi-armed bandits

  • Lecture 19 (Oct 11). Pure exploration in multi-armed bandits

  • Lecture 20 (Oct 16). Fundamental performance limits of bandit algorithms — change-of-measure approach

  • Lecture 21 (Oct 18). Fundamental performance limits of bandit algorithms — change-of-measure approach

  • Lecture 22 (Oct 23). Stochastic linear bandits, LinUCB algorithm

  • Lecture 23 (Oct 25). Markov Decision Processes

  • Lecture 24 (Nov 1). Optimal planning for MDPs — value iteration

  • Lecture 25 (Nov 6). Optimal planning for MDPs — policy iteration

  • Lecture 26 (Nov 13). Reinforcement learning (RL) — value estimation methods - Monte Carlo, temporal difference (TD)

  • Lecture 27 (Nov 15). Reinforcement learning (RL) — value estimation methods - Monte Carlo, temporal difference (TD)

  • Lecture 28 (Nov 20). Reinforcement learning (RL) — model-free control - Q-learning, SARSA-based control

  • Lecture 29 (Nov 22). Reinforcement learning (RL) — model-based RL - regret minimization, upper confidence RL (UCRL)

Homework:

Final Exam: 30 Oct 2018 [Final exam]


Last updated: 29-Dec-2024, 15:36:29 IST