Not Logged In

Temporal-Difference Networks With History

Temporal-difference (TD) networks are a formalism for expressing and learning grounded world knowledge in a predictive form [Sutton and Tanner, 2005]. However, not all partially observable Markov decision processes can be efficiently learned with TD networks. In this paper, we extend TD networks by allowing the network-update process (answer network) to depend on the recent history of previous actions and observations rather than only on the most recent action and observation. We show that this extension enables the solution of a larger class of problems than can be solved by the original TD networks or by historybased methods alone. In addition, we apply TD networks to a problem that, while still simple, is significantly larger than has previously been considered. We show that history-extended TD networks can learn much of the common-sense knowledge of an egocentric gridworld domain with a single bit of perception.

Citation

B. Tanner, R. Sutton. "Temporal-Difference Networks With History". International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, August 2005.

Keywords: egocentric, gridworld, network update, machine learning
Category: In Conference

BibTeX

@incollection{Tanner+Sutton:IJCAI05,
  author = {Brian Tanner and Richard S. Sutton},
  title = {Temporal-Difference Networks With History},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2005,
}

Last Updated: April 24, 2007
Submitted by Christian Smith

University of Alberta Logo AICML Logo