An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
- Richard S. Sutton, Department of Computing Science, University of Alberta
- Ashique Rupam Mahmood
- Martha White, University of Alberta
Citation
R. Sutton, A. Mahmood, M. White. "An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning". Journal of Machine Learning Research (JMLR), (ed: Shie Mannor), 17(73), pp 1-29, January 2016.Keywords: | Temporal-difference learning, Off-policy learning, Function approximation, Stability, Convergence |
Category: | In Journal |
Web Links: | JMLR |
BibTeX
@article{Sutton+al:JMLR16, author = {Richard S. Sutton and Ashique Rupam Mahmood and Martha White}, title = {An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning}, Editor = {Shie Mannor}, Volume = "17", Number = "73", Pages = {1-29}, journal = {Journal of Machine Learning Research (JMLR)}, year = 2016, }Last Updated: February 25, 2020
Submitted by Sabina P