E1 245  Online Prediction and Learning  Survey Topics
(with some representative references, by no means exhaustive)

Sequencedependent regret bounds in online learning

Secondorder regret bounds in the full information setting CesaBianchiMansourStoltz 2007

Regret based on variation in input sequence (full information setting): HazanKale 2008, SteinhardLiang 2014, etc.

Variationbased regret bounds for bandits: HazanKale 2009


Combinatorial bandit problems

Universal Portfolios in finance:

How to efficiently implement Cover’s Univ. Portfolio alg: http://www.jmlr.org/papers/volume3/kalai02a/kalai02a.pdf

A different and efficient Univ. Portfolio alg by Singer et al: http://www.cis.upenn.edu/~mkearns/finread/helmbold98line.pdf

Much better regret bounds depending only on variation in market vectors: http://www.satyenkale.com/papers/varexpconcave.pdf

Univ. Portfolios with trading costs: http://www.cs.cmu.edu/~avrim/Papers/portfolio.ps.gz


Stochastic Linear Bandits

The ConfidenceBall algorithm paper by Dani et al

Rusmevichientong & Tsitsiklis linear bandit paper

The OFUL algorithm  state of the art for linear stochastic bandits


Adversarial Linear Bandits

AbernethyHazanRakhlin paper on using selfconcordant barrier functions as regularizers to achieve optimal sqrt(T) regret in the bandit setting


Bandit problems with infinitely many/continuous arms ("Xarmed" bandits)

Thompson Sampling for Complex Bandit settings

Dueling bandits (bandits with pairwise comparison feedback — useful in recommender systems and online ranking applications)

Original paper on dueling bandits

Recent paper on using UCB techniques to solve dueling bandits

Lower bound and optimal regret for dueling bandits, Komiyama et al 2015


(Projectionfree) Online Convex Optimization using the FrankWolfe optimization algorithm

Thompson Sampling for the purely Bayesian setting

Contextual bandit algorithms

The Monster algorithm


Learning permutations/rankings sequentially given only comparisons

Selecting the best set of multiple arms in bandits

Bandits with side information

Adversarial rewards: MannorShamir, improvements by Alon et al

Stochastic rewards: Caron et al, Buccapatnam et al


Partial monitoring problems (generalizes bandits, harder class of online learning problems)

Prediction, Learning and Games book, Chapter 6

Regret minimization/Online learning in Markov Decision Processes (stochastic multiarmed bandits with state)

Crowdsourcing and bandit problems

Clustered bandits — how to learn if two or more bandit problems appear similar and save effort

Bandits, auctions and mechanism design
Last updated: 07May2024, 14:06:13 IST