View Publication

Reinforcement Learning is Direct Adaptive Optimal Control

Richard S. Sutton, Department of Computing Science, University of Alberta
Andrew Barto, Department of Computer Science, University of Massachusetts at Amherst
R.J. Williams, National Science Foundation

In this paper we present a control-systems perspective on one of the major neural-network approaches to learning control, reinforcement learning. Control problems can be divided into two classes: 1) regulation and tracking problems, in which the objective is to follow a reference trajectory, and 2) optimal control problems, in which the objective is to extremize a functional of the controlled system's behavior that is not necessarily defined in terms of a reference trajectory. Adaptive methods for problems of the first kind are well known, and include self-tuning regulators and model-reference methods, whereas adaptive methods for optimal-control problems have received relatively little attention. Moreover, the adaptive optimal-control methods that have been studied are almost all indirect methods, in which controls are recomputed from an estimated system model at each step. This computation is inherently complex, making adaptive methods in which the optimal controls are estimated directly more attractive. We view reinforcement learning methods as a computationally simple, direct approach to the adaptive optimal control of nonlinear systems. For concreteness, we focus on one reinforcement learning method (Q-learning) and on its analytically proven capabilities for one class of adaptive optimal control problems (markov decision problems with unknown transition probabilities).

Citation

R. Sutton, A. Barto, R. Williams. "Reinforcement Learning is Direct Adaptive Optimal Control". American Control Conference (ACC), January 1991.

Keywords:	computationally, concreteness, complex, machine learning
Category:	In Conference

BibTeX

@incollection{Sutton+al:ACC91,
  author = {Richard S. Sutton and Andrew Barto and R.J. Williams},
  title = {Reinforcement Learning is Direct Adaptive Optimal Control},
  booktitle = {American Control Conference (ACC)},
  year = 1991,
}

Last Updated: January 04, 2007
Submitted by William Thorne

Not Logged In

PapersDB

Reinforcement Learning is Direct Adaptive Optimal Control

Citation

BibTeX