Learning when to stop thinking and do something!
Full Text:
ICML2009_paper421.pdf
An anytime algorithm is capable of returning
a response to the given task at essentially
any time; typically the quality of the response
improves as the time increases. Here,
we consider the challenge of learning when
we should terminate such algorithms on each
of a sequence of iid tasks, to optimize the expected
average reward per unit time. We provide
a system for addressing this challenge,
which combines the global optimizer Cross-
Entropy method with local gradient ascent.
This paper theoretically investigates how far
the estimated gradient is from the true gradient,
then empirically demonstrates that this
system is effective by applying it to a toy
problem, as well as on a real-world face detection
task.
Citation
B. Poczos,
Y. Abbasi-Yadkori,
C. Szepesvari,
R. Greiner,
N. Sturtevant.
"Learning when to stop thinking and do something!".
International Conference on Machine Learning (ICML), June 2009.
Keywords: |
machine learning, stopping time, anytime algorithms, policy gradient |
Category: |
In Conference |
BibTeX
@incollection{Poczos+al:ICML09,
author = {Barnabas Poczos and Yasin Abbasi-Yadkori and Csaba Szepesvari and
Russ Greiner and Nathan R. Sturtevant},
title = {Learning when to stop thinking and do something!},
booktitle = {International Conference on Machine Learning (ICML)},
year = 2009,
}
Last Updated: July 03, 2020
Submitted by Sabina P