Model-Based Least-Squares Policy Evalulation
- Fletcher Lu, School of Computer Science, University of Waterloo
- Dale Schuurmans, AICML
A popular form of policy evaluation for large MDPs is the least-squares temporal dierencing (TD) method. Least-squares TD meth- ods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal dierence method over a model- based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Sec- ond, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively im- proving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain compu- tationally competitive in terms of accuracy with model-free temporal dierencing methods.
Citation
F. Lu, D. Schuurmans. "Model-Based Least-Squares Policy Evalulation". Canadian Conference on Artificial Intelligence (CAI), Halifax, Nova Scotia, Canada, June 2003.Keywords: | least-squares, machine learning |
Category: | In Conference |
BibTeX
@incollection{Lu+Schuurmans:CAI03, author = {Fletcher Lu and Dale Schuurmans}, title = {Model-Based Least-Squares Policy Evalulation}, booktitle = {Canadian Conference on Artificial Intelligence (CAI)}, year = 2003, }Last Updated: June 01, 2007
Submitted by Staurt H. Johnson