Not Logged In

Continuous Time Associative Bandit Problems

Full Text: cbandit-ijcai07.pdf PDF

In this paper we consider an extension of the multi-armed bandit problem. In this generalized setting, the decision maker receives some side information, performs an action chosen from a finite set and then receives a reward. Unlike in the standard bandit settings, performing an action takes a random period of time. The environment is assumed to be stationary, stochastic and memoryless. The goal is to maximize the average reward received in one unit time, that is, to maximize the average rate of return. We consider the on-line learning problem where the decision maker initially does not know anything about the environment but must learn about it by trial and error. We propose an ``upper confidence bound''-style algorithm that exploits the structure of the problem. We show that the regret of this algorithm relative to the optimal algorithm that has perfect knowledge about the problem grows at the optimal logarithmic rate in the number of decisions and scales polynomially with the parameters of the problem.

Citation

A. Gyorgy, L. Kocsis, I. Szabo, C. Szepesvari. "Continuous Time Associative Bandit Problems". International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, January 2007.

Keywords: machine learning
Category: In Conference

BibTeX

@incollection{Gyorgy+al:IJCAI07,
  author = {Andras Gyorgy and Levante Kocsis and Ivett Szabo and Csaba
    Szepesvari},
  title = {Continuous Time Associative Bandit Problems},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2007,
}

Last Updated: April 24, 2007
Submitted by Nelson Loyola

University of Alberta Logo AICML Logo