Not Logged In

TD Models: Modeling the World at a Mixture of Time Scales

Full Text: precup-sutton-98(2).ps PS

Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in reinforcement learning, but also to predict states, i.e., to learn a model of the world's dynamics. We present theory and algorithms for intermixing TD models of the world at different levels of temporal abstraction within a single structure. Such multi-scale TD models can be used in model-based reinforcement-learning architectures and dynamic programming methods in place of conventional Markov models. This enables planning at higher and varied levels of abstraction, and, as such, may prove useful in formulating methods for hierarchical or multi-level planning and reinforcement learning. In this paper we treat only the prediction problem---that of learning a model and value function for the case of fixed agent behavior. Within this context, we establish the theoretical foundations of multi-scale models and derive TD algorithms for learning them. Two small computational experiments are presented to test and illustrate the theory. This work is an extension and generalization of the work of Singh (1992), Dayan (1993), and Sutton & Pinette (1985).

Citation

R. Sutton. "TD Models: Modeling the World at a Mixture of Time Scales". International Conference on Machine Learning (ICML), pp 531-539, January 1995.

Keywords: computational, agent, behaviour, machine learning
Category: In Conference

BibTeX

@incollection{Sutton:ICML95,
  author = {Richard S. Sutton},
  title = {TD Models: Modeling the World at a Mixture of  Time Scales},
  Pages = {531-539},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 1995,
}

Last Updated: May 31, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo