Aditya Gopalan’s Homepage: Teaching

E1 245 - Online Prediction and Learning - Survey Topics

(with some representative references, by no means exhaustive)

Sequence-dependent regret bounds in online learning
- Second-order regret bounds in the full information setting Cesa-Bianchi-Mansour-Stoltz 2007
- Regret based on variation in input sequence (full information setting): Hazan-Kale 2008, Steinhard-Liang 2014, etc.
- Variation-based regret bounds for bandits: Hazan-Kale 2009
Combinatorial bandit problems
Universal Portfolios in finance:
- How to efficiently implement Cover’s Univ. Portfolio alg: http://www.jmlr.org/papers/volume3/kalai02a/kalai02a.pdf
- A different and efficient Univ. Portfolio alg by Singer et al: http://www.cis.upenn.edu/~mkearns/finread/helmbold98line.pdf
- Much better regret bounds depending only on variation in market vectors: http://www.satyenkale.com/papers/var-exp-concave.pdf
- Univ. Portfolios with trading costs: http://www.cs.cmu.edu/~avrim/Papers/portfolio.ps.gz
Stochastic Linear Bandits
- The ConfidenceBall algorithm paper by Dani et al
- Rusmevichientong & Tsitsiklis linear bandit paper
- The OFUL algorithm - state of the art for linear stochastic bandits
Adversarial Linear Bandits
- Abernethy-Hazan-Rakhlin paper on using self-concordant barrier functions as regularizers to achieve optimal sqrt(T) regret in the bandit setting
Bandit problems with infinitely many/continuous arms ("X-armed" bandits)
- The HOO algorithm
- The HCT algorithm
- GP-UCB: Using Gaussian processes to solve continuum-armed bandits
Thompson Sampling for Complex Bandit settings
Dueling bandits (bandits with pairwise comparison feedback — useful in recommender systems and online ranking applications)
- Original paper on dueling bandits
- Recent paper on using UCB techniques to solve dueling bandits
- Lower bound and optimal regret for dueling bandits, Komiyama et al 2015
(Projection-free) Online Convex Optimization using the Frank-Wolfe optimization algorithm
- http://icml.cc/2012/papers/292.pdf
- http://arxiv.org/abs/1406.1305
Thompson Sampling for the purely Bayesian setting
- Bayesian bandits and Eluder dimension
- Bayesian Reinforcement learning
Contextual bandit algorithms
- The Monster algorithm
Learning permutations/rankings sequentially given only comparisons
- Using Exponential Weights
Selecting the best set of multiple arms in bandits
- Bubeck et al
- Kalyanakrishnan et al
Bandits with side information
- Adversarial rewards: Mannor-Shamir, improvements by Alon et al
- Stochastic rewards: Caron et al, Buccapatnam et al
Partial monitoring problems (generalizes bandits, harder class of online learning problems)
- Szepesvari et al
- Prediction, Learning and Games book, Chapter 6
Regret minimization/Online learning in Markov Decision Processes (stochastic multi-armed bandits with state)
- The UCRL2 algorithm
- The REGAL algorithm
Crowdsourcing and bandit problems
- Abraham et al
Clustered bandits — how to learn if two or more bandit problems appear similar and save effort
- Maillard-Mannor
- Gentile et al
Bandits, auctions and mechanism design
- Truthful bandit mechanisms

Last updated: 23-Jun-2025, 13:57:00 IST