Not Logged In

Reinforcement Learning Architectures

Full Text: sutton-92-ISKIT.pdf PDF

Reinforcement learning is the learning of a mapping from situations to actions so as to maximize a scalar reward or reinforcement signal. The learner is not told which action to take, as in most forms of learning, but instead must discover which actions yield the highest reward by trying them. In the most interesting and challenging cases, actions affect not only the immediate reward, but also the next situation, and through that all subsequent rewards. These two characteristics---trial-and-error search and delayed reward---are the two most important distinguishing features of reinforcement learning. In this paper I present a brief overview of the development of reinforcement learning architectures over the past decade, including reinforcement-comparison, actor-critic, and Q-learning architectures. Finally, I present Dyna, a class of architectures based on reinforcement learning but which go beyond trial-and-error learning to include a learned internal model of the world. By intermixing conventional trial and error with hypothetical trial and error using the world model, Dyna systems can plan and learn optimal behavior very rapidly.

Citation

R. Sutton. "Reinforcement Learning Architectures". ISKIT, pp 211-216, January 1992.

Keywords: mapping, reinforcement, hypothetical, machine learning
Category: In Conference

BibTeX

@incollection{Sutton:ISKIT92,
  author = {Richard S. Sutton},
  title = {Reinforcement Learning Architectures},
  Pages = {211-216},
  booktitle = {},
  year = 1992,
}

Last Updated: May 31, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo