View Publication

Monte Carlo matrix inversion policy evaluation

Fletcher Lu, School of Computer Science, University of Waterloo
Dale Schuurmans, AICML

In 1950, Forsythe and Leibler (1950) introduced a statistical technique for nding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Du(1994) subsequently showed relations between this technique and standard dynamic programming and temporal dierencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to statespace size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI possesses accuracy similar to a maximum likelihood model-based policy evaluation approach but avoids ML's slow execution time. In fact, we show that MCMI executes at a similar runtime to temporal dierencing (TD). We then illustrate a least-squares generalization technique for scaling up MCMI to large state spaces. We compare this least-squares Monte Carlo matrix inversion (LSMCMI) technique to the least-squares temporal di erencing (LSTD) approach intro- duced by Boyan (1999) demonstrating that both LSMCMI and LSTD have similar runtime.

Citation

F. Lu, D. Schuurmans. "Monte Carlo matrix inversion policy evaluation". Conference on Uncertainty in Artificial Intelligence (UAI), Acapulco, Mexico, January 2003.

Keywords:	machine learning
Category:	In Conference

BibTeX

@incollection{Lu+Schuurmans:UAI03,
  author = {Fletcher Lu and Dale Schuurmans},
  title = {Monte Carlo matrix inversion policy evaluation},
  booktitle = {Conference on Uncertainty in Artificial Intelligence (UAI)},
  year = 2003,
}

Last Updated: April 24, 2007
Submitted by William Thorne

Not Logged In

PapersDB

Monte Carlo matrix inversion policy evaluation

Citation

BibTeX