Not Logged In

Investigating the Maximum Likelihood Alternative to TD()

Full Text: investigating-the-maximum-likelihood.pdf PDF

The study of value estimation in Markov re- ward processes has been dominated by re- search on temporal difference methods since the introduction of TD(0) in 1988. Temporal di erence methods are often contrasted with a maximum likelihood approach where the transition matrix and reward vector are es- timated explicitly and converted into a value estimate by solving a matrix equation. It is often asserted that maximum likelihood esti- mation yields more accurate values, but the temporal di erence approach is far more eĈ- cient computationally. In this paper we show that the rst assertion is true, but the second can be false in many circumstances. In par- ticular, we show that a reasonable implemen- tation of a sparse matrix solver can yield run times for maximum likelihood that are com- petitive with TD(). In our experiments the maximum likelihood estimator yields more accurate values. This higher accuracy in conjunction with competitive execution time suggests that a model based approach might yet be worth pursuing in scaling up reinforce- ment learning.

Citation

F. Lu, R. Patrascu, D. Schuurmans. "Investigating the Maximum Likelihood Alternative to TD()". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2002.

Keywords: likelihood, alternative, machine learning
Category: In Conference

BibTeX

@incollection{Lu+al:NIPS02,
  author = {Fletcher Lu and Relu Patrascu and Dale Schuurmans},
  title = {Investigating the Maximum Likelihood Alternative to TD()},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2002,
}

Last Updated: June 01, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo