Publications by Sutton, Richard S.
In Journal (refereed)
1. | J. Travnik, K. Mathewson, R. Sutton, P. Pilarski. "Reactive reinforcement learning in asynchronous environments". Frontiers in Robotics and AI, 5, pp n/a, June 2018. |
2. | H. Yu, A. Mahmood, R. Sutton. "On Generalized Bellman Equations and Temporal-Difference Learning". Journal of Machine Learning Research (JMLR), 19(48), pp 1-49, January 2018. |
3. | A. Edwards, M. Dawson, J. Hebert, C. Sherstan, R. Sutton, K. Chan, P. Pilarski. "Application of Real-time Machine Learning to Myoelectric Prosthesis Control: A Case Series in Adaptive Switching". Prosthetics and Orthotics International, 40(5), pp 573–581, October 2016. |
4. | R. Sutton, A. Mahmood, M. White. "An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning". Journal of Machine Learning Research (JMLR), (ed: Shie Mannor), 17(73), pp 1-29, January 2016. |
5. | H. Seije, A. Mahmood, P. Pilarski, M. Machado, R. Sutton. "True Online Temporal-Difference Learning". Journal of Machine Learning Research (JMLR), 17(145), pp n/a, January 2016. |
6. | J. Modayil, A. White, R. Sutton. "Multi-timescale Nexting in a Reinforcement Learning Robot". Adaptive Behavior, 22(2), pp 146-160, April 2014. |
7. | P. Pilarski, M. Dawson, T. Degris, J. Carey, K. Chan, J. Hebert, R. Sutton. "Adaptive Artificial Limbs: A Real-time Approach to Prediction and Anticipation". IEEE Robotics and Automation Magazine, 20(1), pp 53-64, March 2013. |
8. | P. Stone, R. Sutton, G. Kuhlmann. "Reinforcement Learning for RoboCup-Soccer Keepaway". Adaptive Behavior, (13(3),), pp pp 165-188, March 2005. |
9. | R. Sutton, D. Precup, S. Singh. "Between MDPs and Semi-MDPs: A Framework for Temporal Abstractions in Reinforcement Learning". Artificial Intelligence (AIJ), 112, pp 181-211, January 1999. |
10. | J. Santamaria, R. Sutton, A. Ram. "Experiments With Reinforcement Learning in Problems With Continuous State and Action Spaces". Adaptive Behavior, 2, pp 163-218, January 1998. |
11. | R. Sutton. "On the Significance of Markov Decision Processes". Artificial Neural Networks - ICANN'97, (ed: W. Gerstner, A Germond, M. Hasler, J-D Nicoud), pp 273-282, January 1997. |
12. | S. Singh, R. Sutton. "Reinforcement Learning With Replacing Eligibility Traces". Machine Learning Journal (MLJ), (22), pp 123-158, January 1996. |
13. | R. Sutton. "Introduction: The Challenge of Reinforcement Learning". Machine Learning Journal (MLJ), (ed: R.Sutton), 8(3-4), pp 225-227, January 1992. |
14. | R. Sutton. "Machines That Learn and Mimic the Brain". ACCESS, GTE's Journal of Science and Technology, January 1992. |
15. | R. Sutton. "First Results With Dyna: An Integrated Architecture for Learning, Planning, and Reacting". Neural Networks for Control, (ed: Miller T, Sutton R. S., Werbos P.), pp 179-189, January 1990. |
16. | A. Barto, R. Sutton, C. Watkins. "Learning and Sequential Decision Making". Learning and Computational Neuroscience, (ed: M. Gabriel, J.W. Moore), pp 539-602, January 1990. |
17. | R. Sutton, A. Barto. "Time-Derivative Models of Pavlovian Reinforcement". Learning and Computational Neuroscience, (ed: M. Gabriel, J.W. Moore), pp 497-537, January 1990. |
18. | R. Sutton. "Learning to Predict by the Methods of Temporal Differences". Machine Learning Journal (MLJ), 3(1), pp 9-44, January 1988. |
19. | O. Selfridge, R. Sutton, C. Anderson. "Selected Bibliography on Connectionism". Evolution Learning and cognition, (ed: Y.C. Lee), pp 391-403, January 1988. |
20. | J. Moore, J. Desmond, N. Berthier, D. Blazis, R. Sutton, A. Barto. "Simulation of the classically conditioned nictitating membrane response by a neuron-like adaptive element: Response topography, neuronal firing, and interstimulus intervals". Behavioural Brain Research, (21), pp 143-154, May 1986. |
21. | A. Barto, R. Sutton. "Neural problem solving". Synaptic Modification, Neuron Selectivity, and Nervous System Organization, (ed: W.B. Levy, J.A. Anderson), pp 123-152, May 1985. |
22. | A. Barto, R. Sutton, C. Anderson. "Neuron-like adaptive elements that can solvedifficult learning control problems". IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(5), pp 834-846, January 1985. |
23. | A. Barto, C. Anderson, R. Sutton. "Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element". Behavioural Brain Research, pp 221-235, January 1985. |
In Conference (refereed)
24. | Y. Wan, M. Zaheer, R. Sutton, A. White, M. White. "Planning with Expectation Models". International Joint Conference on Artificial Intelligence (IJCAI), (ed: Sarit Kraus), pp 3649-3655, August 2019. |
25. | B. Rafiee, S. Ghiassian, A. White, R. Sutton. "Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots". Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), (ed: Edith Elkind, Manuela Veloso, Noa Agmon, Matthew E. Taylor), pp 332-340, May 2019. |
26. | A. Kearney, A. Koop, C. Sherstan, J. Günther, R. Sutton, P. Pilarski, M. Taylor. "Evaluating Predictive Knowledge". AAAI Fall Symposium, pp 43-46, October 2018. |
27. | C. Sherstan, B. Bennett, K. Young, D. Ashley, A. White, M. White, R. Sutton. "Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return". Conference on Uncertainty in Artificial Intelligence (UAI), (ed: Amir Globerson and Ricardo Silva), pp 63-72, August 2018. |
28. | H. Seijen, A. Mahmood, P. Pilarski, R. Sutton. "An empirical evaluation of True Online TD(lambda)". European Workshop on Reinforcement Learning (EWRL), July 2015. |
29. | K. Chan, M. Dawson, A. Edwards, J. Hebert, R. Sutton, P. Pilarski. "Adaptive Switching in Practice: Improving Myoelectric Prosthesis Performance through Reinforcement Learning". Myoelectric Control Symposium, pp 66-70, August 2014. |
30. | A. Edwards, A. Kearney, M. Dawson, R. Sutton, P. Pilarski. "Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb". Multidisciplinary Conference on Reinforcement Learning and Decision Making, September 2013. |
31. | A. White, J. Modayil, R. Sutton. "Scaling Life-long Off-policy Learning". Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (ICDL-Epi, Osaka, Japan, August 2013. |
32. | P. Pilarski, T. Dick, R. Sutton. "Real-time prediction learning for the simultaneous actuation of multiple prosthetic joints". International Conference on Rehabilitation Robotics (ICORR), June 2013. |
33. | D. Silver, R. Sutton, M. Müller. "Temporal-Difference Search in Computer Go". ICAPS, (ed: Daniel Borrajo, Subbarao Kambhampati, Angelo Oddi, Simone Fratini), pp 486-487, June 2013. |
34. | J. Modayil, A. White, P. Pilarski, R. Sutton. "Acquiring a broad range of empirical knowledge in real time by temporal-difference learning". International Conference on Systems, Man, and Cybernetics (SMC), Seoul, South Korea, pp 1903-1910, October 2012. |
35. | J. Modayil, A. White, R. Sutton. "Multi-timescale Nexting in a Reinforcement Learning Robot". International Conference on Simulation of Adaptive Behavior (SAB), Odense, Denmark, (ed: Ziemke T., Balkenius C., Hallam J.), pp 299-309, August 2012. |
36. | T. Degris, M. White, R. Sutton. "Linear Off-Policy Actor-Critic". International Conference on Machine Learning (ICML), pp n/a, June 2012. |
37. | S. Bhatnagar, R. Sutton, M. Ghavamzadeh, M. Lee. "Incremental Natural Actor-Critic Algorithms". Neural Information Processing Systems (NIPS), December 2007. |
38. | D. Silver, R. Sutton, M. Mueller. "Reinforcement Learning of Local Shape in the Game of Go". International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, August 2007. |
39. | R. Sutton, A. Koop, D. Silver. "On the Role of Tracking in Stationary Environments". International Conference on Machine Learning (ICML), April 2007. |
40. | A. Geramifard, M. Bowling, M. Zinkevich, R. Sutton. "iLSTD: Eligibility Traces and Convergence Analysis". Neural Information Processing Systems (NIPS), pp To appear (8 pages), March 2007. |
41. | A. Geramifard, M. Bowling, R. Sutton. "Incremental least-squares temporal difference learning,". National Conference on Artificial Intelligence (AAAI), Boston, Massachusetts, USA, pp 356-361, January 2006. |
42. | B. Tanner, R. Sutton. "TD(lambda) Networks: Temporal-Difference Networks With Eligibility Traces". International Conference on Machine Learning (ICML), Bonn, Germany, August 2005. |
43. | B. Tanner, R. Sutton. "Temporal-Difference Networks With History". International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, August 2005. |
44. | E. Rafols, M. Ring, R. Sutton, B. Tanner. "Using Predictive Representations to Improve Generalization in Reinforcement Learning". International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh, Scotland, August 2005. |
45. | D. Precup, R. Sutton, C. Paduraru, A. Koop, S. Singh. "Off-Policy Learning With Recognizers". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2005. |
46. | R. Sutton, E. Rafols, A. Koop. "Temporal Abstraction in Temporal-Difference Networks". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2005. |
47. | R. Sutton, B. Tanner. "Temporal-Difference Networks". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, (ed: MIT Press), January 2005. |
48. | D. Precup, R. Sutton, S. Dasgupta. "Off-Policy Temporal-Difference Learning With Function Approximation". International Conference on Machine Learning (ICML), Williams College, pp 417-424, January 2001. |
49. | M. Littman, R. Sutton, S. Singh. "Predictive Representations of State". Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, January 2001. |
50. | P. Stone, R. Sutton. "Scaling Reinforcement Learning Toward RoboCup Soccer". International Conference on Machine Learning (ICML), Williams College, January 2001. |
51. | D. Precup, R. Sutton, S. Singh. "Eligibility Traces for Off-Policy Policy Evaluation". International Conference on Machine Learning (ICML), Stanford University, pp 759-766, January 2000. |
52. | R. Sutton, S. Singh, D. Precup, B. Ravindran. "Improved Switching Among Temporally Abstract Actions". Neural Information Processing Systems (NIPS), Denver, CO, USA, pp 1066-1072, January 1999. |
53. | R. Moll, A. Barto, T. Perkins, R. Sutton. "Learning Instance-Independent Value Functions to Enhance Local Search". Neural Information Processing Systems (NIPS), Denver, CO, USA, pp 1017-1023, January 1999. |
54. | R. Sutton. "Open Theoretical Questions in Reinforcement Learning". Conference on Learning Theory (COLT), pp 11-17, January 1999. |
55. | R. Sutton, D. McAllester, S. Singh, Y. Mansour. "Policy Gradient Methods for Reinforcement Learning With Function Approximation". Neural Information Processing Systems (NIPS), Denver, CO, USA, pp 1057-1063, January 1999. |
56. | R. Sutton, D. Precup, S. Singh. "Intra-Option Learning About Temporally Abstract Actions". International Conference on Machine Learning (ICML), Madison, Wisconsin USA, pp 556-564, January 1998. |
57. | D. Precup, R. Sutton. "Multi-Time Models for Temporally Abstract Planning". Neural Information Processing Systems (NIPS), Denver, CO, USA, pp 1050-1056, January 1998. |
58. | D. Precup, R. Sutton, S. Singh. "Theoretical Results on Reinforcement Learning With Temporally Abstract Options". European Conference on Machine Learning (ECML), Chemnitz, Germany, pp 382-393, January 1998. |
59. | A. McGovern, R. Sutton, A. Fagg. "Roles of Macro-Actions in Accelerating Reinforcement Learning". Grace Hopper Celebration of Women in Computing, pp 13-17, September 1997. |
60. | D. Precup, R. Sutton. "Exponentiated Gradient Methods for Reinforcement Learning". International Conference on Machine Learning (ICML), Nashville, pp 272-277, July 1997. |
61. | D. Precup, R. Sutton. "Multi-Time Models for Reinforcement Learning". International Conference on Machine Learning (ICML), Nashville, July 1997. |
62. | D. Precup, R. Sutton, S. Singh. "Planning with Closed-Loop Macro Actions". National Conference on Artificial Intelligence (AAAI), Providence, Rhode Island, pp 73-76, May 1997. |
63. | R. Sutton. "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding". Neural Information Processing Systems (NIPS), pp 1038-1044, January 1996. |
64. | R. Sutton. "TD Models: Modeling the World at a Mixture of Time Scales". International Conference on Machine Learning (ICML), pp 531-539, January 1995. |
65. | R. Sutton, S. Whitehead. "Online Learning With Random Representations". International Conference on Machine Learning (ICML), Amherst, MA, USA, (ed: M. Kaufmann), pp 314-321, January 1993. |
66. | T. Sanger, R. Sutton, C. Matheus. "Iterative Construction of Sparse Polynomial Approximations". Neural Information Processing Systems (NIPS), Denver, CO, USA, December 1992. |
67. | M. Gluck, P. Glauthier, R. Sutton. "Adaptation of Cue-Specific Learning Rates in Network Models of Human Category Learning". Conference of the Cognitive Science Society (CogSci), pp 540-545, July 1992. |
68. | A. Barto, R. Sutton, C. Watkins. "Sequential decision problems and neural networks". Neural Information Processing Systems (NIPS), Denver, CO, USA, May 1992. |
69. | R. Sutton. "Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta". National Conference on Artificial Intelligence (AAAI), January 1992. |
70. | R. Sutton. "Reinforcement Learning Architectures". ISKIT, pp 211-216, January 1992. |
71. | R. Sutton. "Dyna, an Integrated Architecture for Learning, Planning and Reacting". National Conference on Artificial Intelligence (AAAI), pp 160-163, January 1991. |
72. | R. Sutton. "Reinforcement Learning Architectures for Animats". Conference on Simulation of Adaptive Behavior (CSAB), January 1991. |
73. | R. Sutton, A. Barto, R. Williams. "Reinforcement Learning is Direct Adaptive Optimal Control". American Control Conference (ACC), January 1991. |
74. | S. Whitehead, R. Sutton, D. Ballard. "Advances in Reinforcement Learning and Their Implications for Intelligent Control". IEEE, pp 1289-1297, January 1990. |
75. | R. Sutton. "Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming". International Conference on Machine Learning (ICML), Austin, Texas, USA, pp 216-224, January 1990. |
76. | C. Anderson, J. Franklin, R. Sutton. "Learning a Nonlinear Model of a Manufacturing Process Using Multilayer Connectionist Networks". IEEE, pp 404-409, January 1990. |
77. | J. Franklin, R. Sutton, C. Anderson. "Application of Connectionist Learning Methods to Manufacturing Process Monitoring". IEEE, pp 709-712, January 1989. |
78. | R. Sutton. "Artificial Intelligence as a Control Problem: Comments on the Relationship Between Machine Learning and Intelligent Control". IEEE, January 1989. |
79. | R. Sutton, A. Barto. "A Temporal-Difference Model of Classical Conditioning". Conference of the Cognitive Science Society (CogSci), pp 355-378, January 1987. |
80. | R. Sutton. "Two problems with backpropagation and other steepest-descent learning procedures for networks". Conference of the Cognitive Science Society (CogSci), pp 823-831, May 1986. |
81. | J. Moore, J. Desmond, N. Berthier, D. Blazis, R. Sutton, A. Barto. "Connectionist learning in real time: Sutton-Barto adaptive element and classical conditioning of the nictitating membrane response". Conference of the Cognitive Science Society (CogSci), pp 318-322, May 1985. |
82. | R. Sutton, B. Pinette. "The learning of world models by connectionist networks". Conference of the Cognitive Science Society (CogSci), pp 54-64, May 1985. |
83. | O. Selfridge, R. Sutton, A. Barto. "Training and tracking in robotics". International Joint Conference on Artificial Intelligence (IJCAI), pp 670-672, May 1985. |
In Workshop
84. | V. Veeriah, P. Pilarski, R. Sutton. "Face valuing: Training user interfaces with facial expressions and reinforcement learning". Workshop on Interactive Machine Learning, July 2016. |
85. | P. Pilarski, R. Sutton, K. Mathewson. "Prosthetic Devices as Goal-Seeking Agents". Present and Future of Non-invasive Peripheral-Nervous-System Machine Interfaces: Progress in Restori, August 2015. |
86. | R. Sutton. "Learning distributed, searchable, internal models". Distributed Artificial Intelligence Workshop, May 2007. |
87. | P. Stone, R. Sutton, S. Singh. "Reinforcement Learning for 3 vs. 2 Keepaway". RoboCup, January 2001. |
88. | A. McGovern, D. Precup, B. Ravindran, S. Singh, R. Sutton. "Hierarchical Optimal Control of MDPs". Yale Workshop on Adaptive and Learning Systems, pp 186-191, January 1998. |
89. | R. Mehra, B. Ravichandran, R. Sutton. "Adaptive Intelligent Scheduling for ATM Networks". Yale Workshop on Adaptive and Learning Systems, pp 106-111, January 1996. |
90. | L. Kuvayev, R. Sutton. "Model-Based Reinforcement Learning With An Approximate, Learned Model". Yale Workshop on Adaptive and Learning Systems, pp 101-105, January 1996. |
91. | R. Sutton, S. Singh. "On Bias and Step Size in Temporal-Difference Learning". Yale Workshop on Adaptive and Learning Systems, pp 91-96, January 1994. |
92. | A. Barto, R. Sutton. "Gain Adaptation Beats Least Squares?". Yale Workshop on Adaptive and Learning Systems, pp 161-166, January 1992. |
93. | R. Sutton, C. Matheus. "Learning Polynomial Functions by Feature Construction". IWML, January 1991. |
94. | R. Sutton. "Planning by Incremental Dynamic Programming". International Workshop on Machine Learning, pp 353-357, January 1991. |
95. | R. Sutton. "Artificial intelligence by dynamic programming". Yale Workshop on Adaptive and Learning Systems, pp 89-95, May 1990. |
96. | R. Sutton. "Convergence Theory for a New Kind of Prediction Learning". WCLT, pp 421-422, January 1988. |
Other Categories
97. | E. Ludvig, R. Sutton, E. Verbeek, J. Neufeld, E. Kehoe. "Stimulus representation and response timing in a temporal-difference (TD) model of classical conditioning". Pavlovian Society, October 2007. |
98. | M. Littman, R. Sutton, S. Singh. "Predictive Representations of State". Predictive Representations of World Knowledge, January 2002. |
99. | R. Sutton. "Reinforcement learning". MIT Encyclopedia of the Cognitive Sciences, MIT Press, (ed: R. Wilson F. Keil), pp 715-717, May 1999. |
100. | A. McGovern, R. Sutton. "Macro-Actions in Reinforcement Learning: An Empirical Analysis". Technical Report, January 1998. |
101. | R. Sutton, A. Barto. "Reinforcement Learning: An Introduction". MIT Press, January 1998. |
102. | D. Precup, R. Sutton, S. Singh. "Notes". National Conference on Artificial Intelligence (AAAI), Providence, Rhode Island, January 1997. |
103. | D. Precup, R. Sutton. "Empirical Comparison of Gradient Descent and Exponentiated Gradient Descent in Supervised and Reinforcement Learning". Technical Report, January 1996. |
104. | R. Sutton. "Reinforcement Learning". Reinforcement Learning, Reprinting of a special issue of Machine Learning Journal, Kluwer Academic Press, (ed: Sutton R. S.), May 1992. |
105. | R. Sutton. "Integrated Modeling and Control Based on Reinforcement Learning and Dynamic Programming". January 1991. |
106. | W. Miller, R. Sutton, P. Werbos. "Neural Networks for Control". MIT Press, (ed: W. Miller, R. Sutton, P. Werbos.), January 1991. |
107. | J. Franklin, R. Sutton, C. Anderson, O. Selfridge, D. Schwartz. "Connectionist Learning Control at GTE Laboratories". pp 242-253, February 1990. |
108. | R. Sutton. "Implementation Details of the TD(lambda) Procedure for the Case of Vector Predictions and Backpropagation". Technical Report, January 1989. |
109. | R. Sutton. "NADALINE: A Normalized Adaptive Linear Element That Learns Efficiently". Technical Report, January 1988. |