Not Logged In

Bayesian Policy Gradient Algorithms

Full Text: nips06.pdf PDF

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

Citation

M. Ghavamzadeh, Y. Engel. "Bayesian Policy Gradient Algorithms". Neural Information Processing Systems (NIPS), December 2006.

Keywords: machine learning
Category: In Conference

BibTeX

@incollection{Ghavamzadeh+Engel:NIPS06,
  author = {Mohammad Ghavamzadeh and Yaakov Engel},
  title = {Bayesian Policy Gradient Algorithms},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2006,
}

Last Updated: February 01, 2008
Submitted by Nelson Loyola

University of Alberta Logo AICML Logo