View Publication

Finite Time Bounds for Sampling Based Fitted Value Iteration

Csaba Szepesvari, Department of Computing Science; PI of AICML
Remi Munos

Full Text: savi_icml2005.pdf

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted $L^p$-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.

Citation

C. Szepesvari, R. Munos. "Finite Time Bounds for Sampling Based Fitted Value Iteration". International Conference on Machine Learning (ICML), Bonn, Germany, pp 881-886, January 2005.

Keywords:	machine learning
Category:	In Conference

BibTeX

@incollection{Szepesvari+Munos:ICML05,
  author = {Csaba Szepesvari and Remi Munos},
  title = {Finite Time Bounds for Sampling Based Fitted Value Iteration},
  Pages = {881-886},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2005,
}

Last Updated: April 24, 2007
Submitted by William Thorne

Not Logged In

PapersDB

Finite Time Bounds for Sampling Based Fitted Value Iteration

Citation

BibTeX