Efficient Exploration for Optimizing Immediate Reward
- Dale Schuurmans, AICML
- Lloyd Greenwald, Department of Mathematics and Computer Science, Drexel University
We consider the problem of learning an effective behav ior strategy from reward. Although much studied, the issue of how to use prior knowledge to scale optimal behavior learning up to realworld problems remains an important open issue. We investigate the inherent datacomplexity of behav iorlearning when the goal is simply to optimize im mediate reward. Although easier than reinforcement learning, where one must also cope with state dynam ics, immediate reward learning is still a common prob lem and is fundamentally harder than supervised learn ing. For optimizing immediate reward, prior knowledge can be expressed either as a bias on the space of possi ble reward models, or a bias on the space of possi ble controllers. We investigate the two paradigmatic learning approaches of indirect (rewardmodel) learn ing and directcontrol learning, and show that neither uniformly dominates the other in general. Modelbased learning has the advantage of generalizing reward ex periences across states and actions, but directcontrol learning has the advantage of focusing only on poten tially optimal actions and avoiding learning irrelevant world details. Both strategies can be strongly advanta geous in different circumstances. We introduce hybrid learning strategies that combine the benefits of both approaches, and uniformly improve their learning effi ciency.
Citation
D. Schuurmans, L. Greenwald. "Efficient Exploration for Optimizing Immediate Reward". National Conference on Artificial Intelligence (AAAI), Orlando, Florida, July 1999.Keywords: | efficient, exploration, immediate reward, machine learning |
Category: | In Conference |
BibTeX
@incollection{Schuurmans+Greenwald:AAAI99, author = {Dale Schuurmans and Lloyd Greenwald}, title = {Efficient Exploration for Optimizing Immediate Reward}, booktitle = {National Conference on Artificial Intelligence (AAAI)}, year = 1999, }Last Updated: August 16, 2007
Submitted by Russ Greiner