Not Logged In

Model-Based Least-Squares Policy Evalulation

Full Text: cai03a.ps PS

A popular form of policy evaluation for large MDPs is the least-squares temporal di erencing (TD) method. Least-squares TD meth- ods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal di erence method over a model- based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Sec- ond, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively im- proving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain compu- tationally competitive in terms of accuracy with model-free temporal di erencing methods.

Citation

F. Lu, D. Schuurmans. "Model-Based Least-Squares Policy Evalulation". Canadian Conference on Artificial Intelligence (CAI), Halifax, Nova Scotia, Canada, June 2003.

Keywords: least-squares, machine learning
Category: In Conference

BibTeX

@incollection{Lu+Schuurmans:CAI03,
  author = {Fletcher Lu and Dale Schuurmans},
  title = {Model-Based Least-Squares Policy Evalulation},
  booktitle = {Canadian Conference on Artificial Intelligence (CAI)},
  year = 2003,
}

Last Updated: June 01, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo