Not Logged In

Organizing Experience: a Deeper Look at Replay Mechanisms for Samplebased Planning in Continuous-State Domains

Full Text: 0666.pdf PDF

Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.

Citation

Y. Pan, M. Zaheer, A. White, A. Patterson, M. White. "Organizing Experience: a Deeper Look at Replay Mechanisms for Samplebased Planning in Continuous-State Domains". International Joint Conference on Artificial Intelligence (IJCAI), (ed: Jerome Lang), pp 4794--4800, July 2018.

Keywords: Machine Learning, Reinforcement Learning Planning and Scheduling, Markov Decisions Processes
Category: In Conference
Web Links: IJCAI
  DOI

BibTeX

@incollection{Pan+al:IJCAI18,
  author = {Yangchen Pan and Muhammad Zaheer and Adam White and Andrew
    Patterson and Martha White},
  title = {Organizing Experience: a Deeper Look at Replay Mechanisms for
    Samplebased Planning in Continuous-State Domains},
  Editor = {Jerome Lang},
  Pages = {4794--4800},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2018,
}

Last Updated: February 24, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo