Investigating the Maximum Likelihood Alternative to TD()
- Fletcher Lu, School of Computer Science, University of Waterloo
- Relu Patrascu, Department of Computer Science, University of Toronto
- Dale Schuurmans, AICML

The study of value estimation in Markov re- ward processes has been dominated by re- search on temporal difference methods since the introduction of TD(0) in 1988. Temporal dierence methods are often contrasted with a maximum likelihood approach where the transition matrix and reward vector are es- timated explicitly and converted into a value estimate by solving a matrix equation. It is often asserted that maximum likelihood esti- mation yields more accurate values, but the temporal dierence approach is far more eĈ- cient computationally. In this paper we show that the rst assertion is true, but the second can be false in many circumstances. In par- ticular, we show that a reasonable implemen- tation of a sparse matrix solver can yield run times for maximum likelihood that are com- petitive with TD(). In our experiments the maximum likelihood estimator yields more accurate values. This higher accuracy in conjunction with competitive execution time suggests that a model based approach might yet be worth pursuing in scaling up reinforce- ment learning.
Citation
F. Lu, R. Patrascu, D. Schuurmans. "Investigating the Maximum Likelihood Alternative to TD()". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2002.| Keywords: | likelihood, alternative, machine learning | 
| Category: | In Conference | 
BibTeX
@incollection{Lu+al:NIPS02,
  author = {Fletcher Lu and Relu Patrascu and Dale Schuurmans},
  title = {Investigating the Maximum Likelihood Alternative to TD()},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2002,
}Last Updated: July 01, 2007Submitted by Staurt H. Johnson
 
        