E1 245 - Online Prediction and Learning - Survey Topics
(with some representative references, by no means exhaustive)
-
Sequence-dependent regret bounds in online learning
-
Second-order regret bounds in the full information setting Cesa-Bianchi-Mansour-Stoltz 2007
-
Regret based on variation in input sequence (full information setting): Hazan-Kale 2008, Steinhard-Liang 2014, etc.
-
Variation-based regret bounds for bandits: Hazan-Kale 2009
-
-
Combinatorial bandit problems
-
Universal Portfolios in finance:
-
How to efficiently implement Cover’s Univ. Portfolio alg: http://www.jmlr.org/papers/volume3/kalai02a/kalai02a.pdf
-
A different and efficient Univ. Portfolio alg by Singer et al: http://www.cis.upenn.edu/~mkearns/finread/helmbold98line.pdf
-
Much better regret bounds depending only on variation in market vectors: http://www.satyenkale.com/papers/var-exp-concave.pdf
-
Univ. Portfolios with trading costs: http://www.cs.cmu.edu/~avrim/Papers/portfolio.ps.gz
-
-
Stochastic Linear Bandits
-
The ConfidenceBall algorithm paper by Dani et al
-
Rusmevichientong & Tsitsiklis linear bandit paper
-
The OFUL algorithm - state of the art for linear stochastic bandits
-
-
Adversarial Linear Bandits
-
Abernethy-Hazan-Rakhlin paper on using self-concordant barrier functions as regularizers to achieve optimal sqrt(T) regret in the bandit setting
-
-
Bandit problems with infinitely many/continuous arms ("X-armed" bandits)
-
Thompson Sampling for Complex Bandit settings
-
Dueling bandits (bandits with pairwise comparison feedback — useful in recommender systems and online ranking applications)
-
Original paper on dueling bandits
-
Recent paper on using UCB techniques to solve dueling bandits
-
Lower bound and optimal regret for dueling bandits, Komiyama et al 2015
-
-
(Projection-free) Online Convex Optimization using the Frank-Wolfe optimization algorithm
-
Thompson Sampling for the purely Bayesian setting
-
Contextual bandit algorithms
-
The Monster algorithm
-
-
Learning permutations/rankings sequentially given only comparisons
-
Selecting the best set of multiple arms in bandits
-
Bandits with side information
-
Adversarial rewards: Mannor-Shamir, improvements by Alon et al
-
Stochastic rewards: Caron et al, Buccapatnam et al
-
-
Partial monitoring problems (generalizes bandits, harder class of online learning problems)
-
Prediction, Learning and Games book, Chapter 6
-
Regret minimization/Online learning in Markov Decision Processes (stochastic multi-armed bandits with state)
-
Crowdsourcing and bandit problems
-
Clustered bandits — how to learn if two or more bandit problems appear similar and save effort
-
Bandits, auctions and mechanism design
Last updated: 28-Jul-2024, 23:23:52 IST