Not Logged In

Interpolation-based Q-learning

Full Text: szws_icml2004_rlfapp.pdf PDF

We consider a variant of Q-learning in continuous state spaces under the total expected discounted cost criterion combined with local function approximation methods. Provided that the function approximator satisfies certain interpolation properties, the resulting algorithm is shown to converge with probability one. The limit function is shown to satisfy a fixed point equation of the Bellman type, where the fixed point operator depends on the stationary distribution of the exploration policy and approximation properties of the function approximation method. The basic algorithm is extended in several ways. In particular, a variant of the algorithm is obtained that is shown to converge in probability to the optimal Q function. Preliminary computer simulations confirm the validity of the approach.

Citation

C. Szepesvari, W. Smart. "Interpolation-based Q-learning". International Conference on Machine Learning (ICML), pp 791-798, January 2004.

Keywords: machine learning
Category: In Conference

BibTeX

@incollection{Szepesvari+Smart:ICML04,
  author = {Csaba Szepesvari and William Smart},
  title = {Interpolation-based Q-learning},
  Pages = {791-798},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2004,
}

Last Updated: April 24, 2007
Submitted by William Thorne

University of Alberta Logo AICML Logo