Not Logged In

Reduced-Variance Payoff Estimation in Adversarial Bandit Problems

Full Text: kocsis_ecml2005.pdf PDF

A natural way to compare learning methods in non-stationary environments is to compare their regret. In this paper we consider the regret of algorithms in adversarial multi-armed bandit problems. We propose several methods to improve the performance of the baseline exponentially weighted average forecaster by changing the payoff-estimation methods. We argue that improved performance can be achieved by constructing payoff estimation methods that produce estimates with low variance. Our arguments are backed up by both theoretical and empirical results. In fact, our empirical results show that significant performance gains are possible over the baseline algorithm.

Citation

L. Kocsis, C. Szepesvari. "Reduced-Variance Payoff Estimation in Adversarial Bandit Problems". European Conference on Machine Learning (ECML), Porto, Portugal, January 2005.

Keywords: machine learning
Category: In Conference

BibTeX

@incollection{Kocsis+Szepesvari:ECML05,
  author = {Levante Kocsis and Csaba Szepesvari},
  title = {Reduced-Variance Payoff Estimation in Adversarial Bandit Problems},
  booktitle = {European Conference on Machine Learning (ECML)},
  year = 2005,
}

Last Updated: April 24, 2007
Submitted by William Thorne

University of Alberta Logo AICML Logo