Not Logged In

TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces

Full Text: 112_TDLambdaNetworks_TannerSutton.pdf PDF

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-step backups of future predictions. In our work, we introduce a generalization of the 1-step TD network specification that is based on the TD(λ) learning al- gorithm, creating TD(λ) networks. We present experimental results that show TD(λ) networks can learn solutions in more complex environments than TD networks. We also show that in problems that can be solved by TD networks, TD(λ) networks generally learn solutions much faster than their 1-step counterparts. Finally, we present an analysis of our algorithm that shows that the computational cost of TD(λ) networks is only slightly more than that of TD networks.

Citation

B. Tanner, R. Sutton. "TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces". International Conference on Machine Learning (ICML), Bonn, Germany, August 2005.

Keywords: temporal-difference, formalism, environments, machine learning
Category: In Conference

BibTeX

@incollection{Tanner+Sutton:ICML05,
  author = {Brian Tanner and Richard S. Sutton},
  title = {TD(lambda) Networks: Temporal-Difference Networks With Eligibility
    Traces},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2005,
}

Last Updated: April 24, 2007
Submitted by AICML Admin Assistant

University of Alberta Logo AICML Logo