View Publication

Between MDPs and Semi-MDPs: A Framework for Temporal Abstractions in Reinforcement Learning

Richard S. Sutton, Department of Computing Science, University of Alberta
Doina Precup, McGill University, Montreal
Satinder Singh, University of Michigan, Ann Arbor, MI

Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key, longstanding challenges for AI. In this paper we consider how these challenges can be addressed within the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action in this framework to include options---closed-loop policies for taking action over a period of time. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Overall, we show that options enable temporally abstract knowledge and action to be included in the reinforcement learning framework in a natural and general way. In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning. Formally, a set of options defined over an MDP constitutes a semi-Markov decision process (SMDP), and the theory of SMDPs provides the foundation for the theory of options. However, the most interesting issues concern the interplay between the underlying MDP and the SMDP and are thus beyond SMDP theory. We present results for three such cases: 1) we show that the results of planning with options can be used during execution to interrupt options and thereby perform even better than planned, 2) we introduce new intra-option methods that are able to learn about an option from fragments of its execution, and 3) we propose a notion of subgoal that can be used to improve the options themselves. All of these results have precursors in the existing literature; the contribution of this paper is to establish them in a simpler and more general setting with fewer changes to the existing reinforcement learning framework. In particular, we show that these results can be obtained without committing to (or ruling out) any particular approach to state abstraction, hierarchy, function approximation, or the macro-utility problem.

Citation

R. Sutton, D. Precup, S. Singh. "Between MDPs and Semi-MDPs: A Framework for Temporal Abstractions in Reinforcement Learning". Artificial Intelligence (AIJ), 112, pp 181-211, January 1999.

Keywords:	options, Q-learning, intra-option, fragments, function approximation, machine learning
Category:	In Journal

BibTeX

@article{Sutton+al:AIJ99,
  author = {Richard S. Sutton and Doina Precup and Satinder Singh},
  title = {Between MDPs and Semi-MDPs: A Framework for Temporal Abstractions in
    Reinforcement Learning},
  Volume = "112",
  Pages = {181-211},
  journal = {Artificial Intelligence (AIJ)},
  year = 1999,
}

Last Updated: August 16, 2007
Submitted by Russ Greiner

Not Logged In

PapersDB

Between MDPs and Semi-MDPs: A Framework for Temporal Abstractions in Reinforcement Learning

Citation

BibTeX