Not Logged In

Incremental Truncated LSTD

Full Text: 216.pdf PDF

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning. Temporal difference (TD) learning algorithms stochastically update the value function, with a linear time complexity in the number of features, whereas least-squares temporal difference (LSTD) algorithms are sample efficient but can be quadratic in the number of features. In this work, we develop an efficient incremental low-rank LSTD(λ) algorithm that progresses towards the goal of better balancing computation and sample efficiency. The algorithm reduces the computation and storage complexity to the number of features times the chosen rank parameter while summarizing past samples efficiently to nearly obtain the sample efficiency of LSTD. We derive a simulation bound on the solution given by truncated low-rank approximation, illustrating a bias-variance trade-off dependent on the choice of rank. We demonstrate that the algorithm effectively balances computational complexity and sample efficiency for policy evaluation in a benchmark task and a high-dimensional energy allocation domain.

Citation

C. Gehring, Y. Pan, M. White. "Incremental Truncated LSTD". International Joint Conference on Artificial Intelligence (IJCAI), (ed: Subbarao Kambhampati), pp 1505-1511, July 2016.

Keywords:  
Category: In Conference
Web Links: IJCAI

BibTeX

@incollection{Gehring+al:IJCAI16,
  author = {Clement Gehring and Yangchen Pan and Martha White},
  title = {Incremental Truncated LSTD},
  Editor = {Subbarao Kambhampati},
  Pages = {1505-1511},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2016,
}

Last Updated: February 25, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo