View Publication

Incremental least-squares temporal difference learning,

Alborz Geramifard
Michael Bowling, University of Alberta
Richard S. Sutton, Department of Computing Science, University of Alberta

Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more computationally efficient than non-incremental least-squares TD methods such as LSTD (Bradtke & Barto 1996; Boyan 1999). In particular, we show that the per-time-step complexities of iLSTD and TD(0) are O(n), where n is the number of features, whereas that of LSTD is O(n2). This difference can be decisive in modern applications of reinforcement learning where the use of a large number features has proven to be an effective solution strategy. We present empirical comparisons, using the test problem introduced by Boyan (1999), in which iLSTD converges faster than TD(0) and almost as fast as LSTD.

Citation

A. Geramifard, M. Bowling, R. Sutton. "Incremental least-squares temporal difference learning,". National Conference on Artificial Intelligence (AAAI), Boston, Massachusetts, USA, pp 356-361, January 2006.

Keywords:	machine learning
Category:	In Conference

BibTeX

@incollection{Geramifard+al:AAAI06,
  author = {Alborz Geramifard and Michael Bowling and Richard S. Sutton},
  title = {Incremental least-squares temporal difference learning,},
  Pages = {356-361},
  booktitle = {National Conference on Artificial Intelligence (AAAI)},
  year = 2006,
}

Last Updated: May 31, 2007
Submitted by Staurt H. Johnson

Not Logged In

PapersDB

Incremental least-squares temporal difference learning,

Citation

BibTeX