Not Logged In

Hierarchical Average Reward Reinforcement Learning

Full Text: jmlr06.pdf PDF

Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representational schemes based on SMDP models have been studied in previous work, all of which are based on the discrete-time discounted SMDP model. In this approach, policies are learned that maximize the long-term discounted sum of rewards. In this paper we investigate two formulations of HRL based on the average-reward SMDP model, both for discrete time and continuous time. In the average-reward model, policies are sought that maximize the expected reward per step. The two formulations correspond to two different notions of optimality that have been explored in previous work on HRL: hierarchical optimality, which corresponds to the set of optimal policies in the space defined by a task hierarchy, and a weaker local model called recursive optimality.

Citation

M. Ghavamzadeh, S. Mahadevan. "Hierarchical Average Reward Reinforcement Learning". Journal of Machine Learning Research (JMLR), 13(2), pp 197-229, June 2006.

Keywords:  
Category: In Journal

BibTeX

@article{Ghavamzadeh+Mahadevan:JMLR06,
  author = {Mohammad Ghavamzadeh and Sridhar Mahadevan},
  title = {Hierarchical Average Reward Reinforcement Learning},
  Volume = "13",
  Number = "2",
  Pages = {197-229},
  journal = {Journal of Machine Learning Research (JMLR)},
  year = 2006,
}

Last Updated: June 08, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo