View Publication

Bandit Based Monte-Carlo Planning

Levante Kocsis
Csaba Szepesvari, Department of Computing Science; PI of AICML

For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.

Citation

L. Kocsis, C. Szepesvari. "Bandit Based Monte-Carlo Planning". European Conference on Machine Learning (ECML), Berlin, Germany, pp 282-293, October 2006.

Keywords:	machine learning
Category:	In Conference

BibTeX

@incollection{Kocsis+Szepesvari:ECML06,
  author = {Levante Kocsis and Csaba Szepesvari},
  title = {Bandit Based Monte-Carlo Planning},
  Pages = {282-293},
  booktitle = {European Conference on Machine Learning (ECML)},
  year = 2006,
}

Last Updated: May 24, 2007
Submitted by Csaba Szepesvari

Not Logged In

PapersDB

Bandit Based Monte-Carlo Planning

Citation

BibTeX