View Publication

iLSTD: Eligibility Traces and Convergence Analysis

Alborz Geramifard
Michael Bowling, University of Alberta
Martin Zinkevich
Richard S. Sutton, Department of Computing Science, University of Alberta

We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in reinforcement learning with linear function approximation. iLSTD is an incremental method for achieving results similar to LSTD, the dataefficient, least-squares version of temporal difference learning, without incurring the full cost of the LSTD computation. LSTD is O(n2), where n is the number of parameters in the linear function approximator, while iLSTD is O(n). In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n = 10, 000) to show substantial computational advantages over LSTD.

Citation

A. Geramifard, M. Bowling, M. Zinkevich, R. Sutton. "iLSTD: Eligibility Traces and Convergence Analysis". Neural Information Processing Systems (NIPS), pp To appear (8 pages), March 2007.

Keywords:	machine learning
Category:	In Conference

BibTeX

@incollection{Geramifard+al:NIPS07,
  author = {Alborz Geramifard and Michael Bowling and Martin Zinkevich and
    Richard S. Sutton},
  title = {iLSTD: Eligibility Traces and Convergence Analysis},
  Pages = {To appear (8 pages)},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2007,
}

Last Updated: April 24, 2007
Submitted by Nelson Loyola

Not Logged In

PapersDB

iLSTD: Eligibility Traces and Convergence Analysis

Citation

BibTeX