Not Logged In

Three-Head Neural Network Architecture for Monte Carlo Tree Search

Full Text: 0523.pdf PDF

AlphaGo Zero pioneered the concept of twohead neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient MCTS since neural leaf estimate can still be back-propagated in tree with delayed node expansion and evaluation. To effectively train the newly introduced action-value head on the same game dataset as for two-head nets, we exploit the optimal relations between parent and children nodes for data augmentation and regularization. In our experiments for the game of Hex, the action-value head learning achieves similar error as the state-value prediction of a twohead architecture. The resulting neural net models are then combined with the same Policy Value MCTS (PV-MCTS) implementation. We show that, due to more efficient use of neural net evaluations, PV-MCTS with three-head neural nets consistently performs better than the two-head ones, significantly outplaying the state-of-the-art player MoHex-CNN.

Citation

C. Gao, M. Müller, R. Hayward. "Three-Head Neural Network Architecture for Monte Carlo Tree Search". International Joint Conference on Artificial Intelligence (IJCAI), (ed: Jérôme Lang), pp 3762-3768, July 2018.

Keywords:  
Category: In Conference
Web Links: doi
  IJCAI

BibTeX

@incollection{Gao+al:IJCAI18,
  author = {Chao Gao and Martin Müller and Ryan Hayward},
  title = {Three-Head Neural Network Architecture for Monte Carlo Tree Search},
  Editor = {Jérôme Lang},
  Pages = {3762-3768},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2018,
}

Last Updated: June 30, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo