View Publication

Learning when to stop thinking and do something!

Barnabas Poczos, AICML, University of Alberta
Yasin Abbasi-Yadkori, Department of Computing Science, University of Alberta
Csaba Szepesvari, Department of Computing Science; PI of AICML
Russ Greiner, Dept of Computing Science; PI of AICML
Nathan R. Sturtevant

Full Text: ICML2009_paper421.pdf

Other Attachments:

An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross- Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.

Citation

B. Poczos, Y. Abbasi-Yadkori, C. Szepesvari, R. Greiner, N. Sturtevant. "Learning when to stop thinking and do something!". International Conference on Machine Learning (ICML), June 2009.

Keywords:	machine learning, stopping time, anytime algorithms, policy gradient
Category:	In Conference

BibTeX

@incollection{Poczos+al:ICML09,
  author = {Barnabas Poczos and Yasin Abbasi-Yadkori and Csaba Szepesvari and
    Russ Greiner and Nathan R. Sturtevant},
  title = {Learning when to stop thinking and do something!},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2009,
}

Last Updated: July 03, 2020
Submitted by Sabina P

Not Logged In

PapersDB

Learning when to stop thinking and do something!

Citation

BibTeX