Not Logged In

Compact, convex upper bound iteration for approximate POMDP planning

Full Text: 06aaai-qppomdp.pdf PDF

Partially observable Markov decision processes (POMDPs) are an intuitive and general way to model sequential decision making problems under uncertainty. Unfortunately, even approximate planning in POMDPs is known to be hard, and developing heuristic planners that can deliver reasonable results in practice has proved to be a significant challenge. In this paper, we present a new approach to approximate value-iteration for POMDP planning that is based on quadratic rather than piecewise linear function approximators. Specifically, we approximate the optimal value function by a convex upper bound composed of a fixed number of quadratics, and optimize it at each stage by semidefinite programming. We demonstrate that our approach can achieve competitive approximation quality to current techniques while still maintaining a bounded size representation of the function approximator. Moreover, an upper bound on the optimal value function can be preserved if required. Overall, the technique requires computation time and space that is only linear in the number of iterations (horizon time).

Citation

T. Wang, P. Poupart, M. Bowling, D. Schuurmans. "Compact, convex upper bound iteration for approximate POMDP planning". National Conference on Artificial Intelligence (AAAI), Boston, Massachusetts, USA, pp 1245-1251, January 2006.

Keywords: machine learning
Category: In Conference

BibTeX

@incollection{Wang+al:AAAI06,
  author = {Tao Wang and Pascal Poupart and Michael Bowling and Dale
    Schuurmans},
  title = {Compact, convex upper bound iteration for approximate POMDP
    planning},
  Pages = {1245-1251},
  booktitle = {National Conference on Artificial Intelligence (AAAI)},
  year = 2006,
}

Last Updated: April 24, 2007
Submitted by William Thorne

University of Alberta Logo AICML Logo