Investigating the Maximum Likelihood Alternative to TD()
- Fletcher Lu, School of Computer Science, University of Waterloo
- Relu Patrascu, Department of Computer Science, University of Toronto
- Dale Schuurmans, AICML
The study of value estimation in Markov re- ward processes has been dominated by re- search on temporal difference methods since the introduction of TD(0) in 1988. Temporal dierence methods are often contrasted with a maximum likelihood approach where the transition matrix and reward vector are es- timated explicitly and converted into a value estimate by solving a matrix equation. It is often asserted that maximum likelihood esti- mation yields more accurate values, but the temporal dierence approach is far more eĈ- cient computationally. In this paper we show that the rst assertion is true, but the second can be false in many circumstances. In par- ticular, we show that a reasonable implemen- tation of a sparse matrix solver can yield run times for maximum likelihood that are com- petitive with TD(). In our experiments the maximum likelihood estimator yields more accurate values. This higher accuracy in conjunction with competitive execution time suggests that a model based approach might yet be worth pursuing in scaling up reinforce- ment learning.
Citation
F. Lu, R. Patrascu, D. Schuurmans. "Investigating the Maximum Likelihood Alternative to TD()". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2002.Keywords: | likelihood, alternative, machine learning |
Category: | In Conference |
BibTeX
@incollection{Lu+al:NIPS02, author = {Fletcher Lu and Relu Patrascu and Dale Schuurmans}, title = {Investigating the Maximum Likelihood Alternative to TD()}, booktitle = {Neural Information Processing Systems (NIPS)}, year = 2002, }Last Updated: June 01, 2007
Submitted by Staurt H. Johnson