Not Logged In

Reusing Learned Policies Between Similar Problems

Full Text: bowling98reusing.pdf PDF

We are interested in being able to leverage policy learning in complex problems upon policies learned for similar problems. This capability is particularly important in robot learning, where gathering data is expensive and time-consuming, and prohibits directly applying reinforcement learning. In this case, we would like to be able to transfer knowledge from a simulator, which may have an inaccurate or crude model of the robot and environment. We observed that when applying a policy learned in a simulator, some parts of the policy effectively apply to the real robots while other parts do not. We then explored learning a complex problem by reusing only parts of the solutions of similar problems. Empirical experiments of learning when part of the policy is fixed show that the complete task is learned faster, but the resulting policy is suboptimal. One of the main contributions of this paper is a theorem and its proof, which states the degree of suboptimality of a policy that is fixed over a subproblem, can be determined without the need for optimally solving the complete problem. We formally define a subproblem and build upon the value equivalence of the boundary states of the subproblem to prove the bound on suboptimality.

Citation

M. Bowling, M. Veloso. "Reusing Learned Policies Between Similar Problems". AI*IA-98 Workshop on New Trends in Robotics, October 1998.

Keywords:  
Category: In Workshop

BibTeX

@misc{Bowling+Veloso:ProceedingoftheAI*IAWorkshopinNewTrendsinRobotics98,
  author = {Michael Bowling and Manuela Veloso},
  title = {Reusing Learned Policies Between Similar Problems},
  booktitle = {AI*IA-98 Workshop on New Trends in Robotics},
  year = 1998,
}

Last Updated: March 09, 2007
Submitted by AICML Admin Assistant

University of Alberta Logo AICML Logo