Not Logged In

Learning when to stop thinking and do something!

Full Text: ICML2009_paper421.pdf PDF
Other Attachments: TimeMgmt_poster.pdf [Poster] PDF

An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross- Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.

Citation

B. Poczos, Y. Abbasi-Yadkori, C. Szepesvari, R. Greiner, N. Sturtevant. "Learning when to stop thinking and do something!". International Conference on Machine Learning (ICML), June 2009.

Keywords: machine learning, stopping time, anytime algorithms, policy gradient
Category: In Conference

BibTeX

@incollection{Poczos+al:ICML09,
  author = {Barnabas Poczos and Yasin Abbasi-Yadkori and Csaba Szepesvari and
    Russ Greiner and Nathan R. Sturtevant},
  title = {Learning when to stop thinking and do something!},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2009,
}

Last Updated: July 03, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo