Roles of Macro-Actions in Accelerating Reinforcement Learning
- Amy McGovern, Department of Computer Science, University of Massachusetts at Amherst
- Richard S. Sutton, Department of Computing Science, University of Alberta
- Andrew H. Fagg, Department of Computer Science, University of Massachusetts at Amherst
We analyze the use of built-in policies, or macro-actions, as a form of domain knowledge that can improve the speed and scaling of reinforcement learning algorithms. Such macro-actions are often used in robotics, and macro-operators are also well-known as an aid to state-space search in AI systems. The macro-actions we consider are closed-loop policies with termination conditions. The macro-actions can be chosen at the same level as primitive actions. Macro-actions commit the learning agent to act in a particular, purposeful way for a sustained period of time. Overall, macro-actions may either accelerate or retard learning, depending on the appropriateness of the macro-actions to the particular task. We analyze their effect in a simple example, breaking the acceleration effect into two parts: 1) the effect of the macro-action in changing exploratory behavior, independent of learning, and 2) the effect of the macro-action on learning, independent of its effect on behavior. In our example, both effects are significant, but the latter appears to be larger. Finally, we provide a more complex gridworld illustration of how appropriately chosen macro-actions can accelerate overall learning.
Citation
A. McGovern, R. Sutton, A. Fagg. "Roles of Macro-Actions in Accelerating Reinforcement Learning". Grace Hopper Celebration of Women in Computing, pp 13-17, September 1997.Keywords: | AI, complex, overall learning |
Category: | In Conference |
BibTeX
@incollection{McGovern+al:GraceHopperCelebrationofWomeninComputing97, author = {Amy McGovern and Richard S. Sutton and Andrew H. Fagg}, title = {Roles of Macro-Actions in Accelerating Reinforcement Learning}, Pages = {13-17}, booktitle = {}, year = 1997, }Last Updated: May 31, 2007
Submitted by Staurt H. Johnson