Not Logged In

Hierarchical Optimal Control of MDPs

Full Text: mcgovern98hierarchical.pdf PDF

Fundamental to reinforcement learning, as well as to the theory of systems and control, is the problem of representing knowledge about the environment and about possible courses of action hierarchically, at a multiplicity of interrelated temporal scales. For example, a human traveler must decide which cities to go to, whether to fly, drive, or walk, and the individual muscle contractions involved in each step. In this paper we survey a new approach to reinforcement learning in which each of these decisions is treated uniformly. Each low-level action and high-level course of action is represented as an option, a (sub)controller and a termination condition. The theory of options is based on the theories of Markov and semi-Markov decision processes, but extends these in significant ways. Options can be used in place of actions in all the planning and learning methods conventionally used in reinforcement learning. Options and models of options can be learned for a wide variety of different subtasks, and then rapidly combined to solve new tasks. Options enable planning and learning simultaneously at a wide variety of times scales, and toward a wide variety of subtasks, substantially increasing the efficiency and abilities of reinforcement learning systems.

Citation

A. McGovern, D. Precup, B. Ravindran, S. Singh, R. Sutton. "Hierarchical Optimal Control of MDPs". Yale Workshop on Adaptive and Learning Systems, pp 186-191, January 1998.

Keywords: subtasks, rapidly, Markov, machine learning
Category: In Workshop

BibTeX

@misc{McGovern+al:YaleWorkshoponAdaptiveandLearningSystems98,
  author = {Amy McGovern and Doina Precup and Balaraman Ravindran and Satinder
    Singh and Richard S. Sutton},
  title = {Hierarchical Optimal Control of MDPs},
  Pages = {186-191},
  booktitle = {},
  year = 1998,
}

Last Updated: May 31, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo