TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces
- Brian Tanner, Department of Computing Sciences, University of Alberta
- Richard S. Sutton, Department of Computing Science, University of Alberta
Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(λ) learning al- gorithm, creating TD(λ) networks. We present experimental results that show TD(λ) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(λ) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(λ) networks is only slightly more than that of TD networks.
Citation
B. Tanner, R. Sutton. "TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces". International Conference on Machine Learning (ICML), Bonn, Germany, August 2005.Keywords: | temporal-difference, formalism, environments, machine learning |
Category: | In Conference |
BibTeX
@incollection{Tanner+Sutton:ICML05, author = {Brian Tanner and Richard S. Sutton}, title = {TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces}, booktitle = {International Conference on Machine Learning (ICML)}, year = 2005, }Last Updated: April 24, 2007
Submitted by AICML Admin Assistant