View Publication

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

Richard S. Sutton, Department of Computing Science, University of Alberta

This paper extends previous work with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned model of the world. In this paper, I present and show results for two Dyna architectures. The Dyna-PI architecture is based on dynamic programming's policy iteration method and can be related to existing AI ideas such as evaluation functions and universal plans (reactive systems). Using a navigation task, results are shown for a simple Dyna-PI system that simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The Dyna-Q architecture is based on Watkins's Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less familiar set of data structures than does Dyna-PI, but is arguably simpler to implement and use. We show that Dyna-Q architectures are easy to adapt for use in changing environments.

Citation

R. Sutton. "Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming". International Conference on Machine Learning (ICML), Austin, Texas, USA, pp 216-224, January 1990.

Keywords:	Dyna-Q, data structures, environments, machine learning
Category:	In Conference

BibTeX

@incollection{Sutton:ICML90,
  author = {Richard S. Sutton},
  title = {Integrated Architectures for Learning, Planning, and Reacting Based
    on Approximating Dynamic Programming},
  Pages = {216-224},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 1990,
}

Last Updated: May 31, 2007
Submitted by Staurt H. Johnson

Not Logged In

PapersDB

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming

Citation

BibTeX