Not Logged In

Temporal-Difference Networks

We introduce a generalization of temporal-difference (TD) learning to networks of interrelated predictions. Rather than relating a single prediction to itself at a later time, as in conventional TD methods, a TD network relates each prediction in a set of predictions to other predictions in the set at a later time. TD networks can represent and apply TD learning to a much wider class of predictions than has previously been possible. Using a random-walk example, we show that these networks can be used to learn to predict by a fixed interval, which is not possible with conventional TD methods. Secondly, we show that when actions are introduced, and the inter-prediction relationships made contingent on them, the usual learning-efficiency advantage of TD methods over Monte Carlo (supervised learning) methods becomes particularly pronounced. Thirdly, we demonstrate that TD networks can learn predictive state representations that enable exact solution of a non-Markov problem. A very broad range of inter-predictive temporal relationships can be expressed in these networks. Overall we argue that TD networks represent a substantial extension of the abilities of TD methods and bring us closer to the goal of representing world knowledge in entirely predictive, grounded terms.

Citation

R. Sutton, B. Tanner. "Temporal-Difference Networks". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, (ed: MIT Press), January 2005.

Keywords: random-walk, Monte Carlo, non-Markov, machine learning
Category: In Conference

BibTeX

@incollection{Sutton+Tanner:NIPS05,
  author = {Richard S. Sutton and Brian Tanner},
  title = {Temporal-Difference Networks},
  Editor = {MIT Press},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2005,
}

Last Updated: April 24, 2007
Submitted by AICML Admin Assistant

University of Alberta Logo AICML Logo