Not Logged In

Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

Full Text: 2019-Xiao-combat-compounding-error-NeurIPS-RL.pdf PDF

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for choosing the planning horizon on a state-dependent basis, where a state’s planning horizon is determined by the maximum cumulative model error around that state. We demonstrate that these state-dependent model errors can be learned with Temporal Difference methods, based on a novel approach of temporally decomposing the cumulative model errors. Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.

Citation

C. Xiao, Y. Wu, C. Ma, D. Schuurmans, M. Müller. "Learning to Combat Compounding-Error in Model-Based Reinforcement Learning". NeurIPS Deep RL Workshops, December 2019.

Keywords:  
Category: In Workshop
Web Links: Webdocs

BibTeX

@misc{Xiao+al:19,
  author = {Chenjun Xiao and Yifan Wu and Chen Ma and Dale Schuurmans and
    Martin Müller},
  title = {Learning to Combat Compounding-Error in Model-Based Reinforcement
    Learning},
  booktitle = {NeurIPS Deep RL Workshops},
  year = 2019,
}

Last Updated: June 29, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo