Not Logged In

Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming

This is a summary of results with Dyna, a class of architectures for intelligent systems based on approximating dynamic programming methods. Dyna architectures integrate trial-and-error (reinforcement) learning and execution-time planning into a single process operating alternately on the world and on a learned forward model of the world. We describe and show results for two Dyna architectures. The Dyna-PI architecture is based on dynamic programming's policy iteration method and can be related to existing AI ideas such as evaluation functions and universal plans (reactive systems). Using a navigation task, results are shown for a simple Dyna-PI system which simultaneously learns by trial and error, learns a world model, and plans optimal routes using the evolving world model. The Dyna-Q architecture is based on Watkins's Q-learning, a new kind of reinforcement learning. Dyna-Q uses a less familiar set of data structures than does Dyna-PI, but is arguably simpler to implement and use. We show that Dyna-Q architectures are easy to adapt for use in changing environments.

Citation

R. Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming". January 1991.

Keywords: Q-learning, dyna-PI, trial and error, machine learning
Category: In Book

BibTeX

@inbook{Sutton:91,
  author = {Richard S. Sutton},
  title = {Integrated Modeling and Control Based on Reinforcement Learning and
    Dynamic Programming},
  year = 1991,
}

Last Updated: January 04, 2007
Submitted by William Thorne

University of Alberta Logo AICML Logo