Not Logged In

Maximum Entropy Monte-Carlo Planning

We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT. Our experimental results also demonstrate that MENTS is more sample efficient than UCT in both synthetic problems and Atari 2600 games.

Citation

C. Xiao, J. Mei, R. Huang, D. Schuurmans, M. Müller. "Maximum Entropy Monte-Carlo Planning". Neural Information Processing Systems (NIPS), pp 9516-9524, December 2019.

Keywords:  
Category: In Conference
Web Links: NeurIPS

BibTeX

@incollection{Xiao+al:NIPS19,
  author = {Chenjun Xiao and Jincheng Mei and Ruitong Huang and Dale Schuurmans
    and Martin Müller},
  title = {Maximum Entropy Monte-Carlo Planning},
  Pages = {9516-9524},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2019,
}

Last Updated: June 29, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo