Not Logged In

Temporal-Difference Search in Computer Go

Full Text: 6037-30143-1-PB.pdf PDF

Temporal-difference (TD) learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. Monte-Carlo tree search is a recent algorithm for simulation-based search, which has been used to achieve master-level play in Go. We have introduced a new approach to high-performance planning. Our method, TD search, combines TD learning with simulation-based search. Like Monte-Carlo tree search, value estimates are updated by learning online from simulated experience. Like TD learning, it uses value function approximation and bootstrapping to efficiently generalise between related states. We applied TD search to the game of 9x9 Go, using a million binary features matching simple patterns of stones. Without any explicit search tree, our approach outperformed a vanilla Monte-Carlo tree search with the same number of simulations. When combined with a simple alpha-beta search, our program also outperformed all traditional (pre-Monte-Carlo) search and machine learning programs on the 9x9 Computer Go Server.

Citation

D. Silver, R. Sutton, M. Müller. "Temporal-Difference Search in Computer Go". ICAPS, (ed: Daniel Borrajo, Subbarao Kambhampati, Angelo Oddi, Simone Fratini), pp 486-487, June 2013.

Keywords: Reinforcement Learning, Simulation-Based Planning
Category: In Conference
Web Links: AAAI

BibTeX

@incollection{Silver+al:ICAPS13,
  author = {David Silver and Richard S. Sutton and Martin Müller},
  title = {Temporal-Difference Search in Computer Go},
  Editor = {Daniel Borrajo, Subbarao Kambhampati, Angelo Oddi, Simone Fratini},
  Pages = {486-487},
  booktitle = {},
  year = 2013,
}

Last Updated: July 02, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo