View Publication

Adversarial Policy Gradient for Alternating Markov Games

Full Text: adversarial_policy_gradient_for_alternating_markov_games.pdf

Policy gradient reinforcement learning has been applied to two-player alternateturn zero-sum games, e.g., in AlphaGo, self-play REINFORCE was used to improve the neural net model after supervised learning. In this paper, we emphasize that two-player zero-sum games with alternating turns, which have been previously formulated as Alternating Markov Games (AMGs), are different from standard MDP because of their two-agent nature. We exploit the difference in associated Bellman equations, which leads to different policy iteration algorithms. As policy gradient method is a kind of generalized policy iteration, we show how these differences in policy iteration are reflected in policy gradient for AMGs. We formulate an adversarial policy gradient and discuss potential possibilities for developing better policy gradient methods other than self-play REINFORCE. The core idea is to estimate the minimum rather than the mean for the “critic”. Experimental results on the game of Hex show the modified Monte Carlo policy gradient methods are able to learn better pure neural net policies than the REINFORCE variants. To apply learned neural weights to multiple board sizes Hex, we describe a board-size independent neural net architecture. We show that when combined with search, using a single neural net model, the resulting program consistently beats MoHex 2.0, the previous state-of-the-art computer Hex player, on board sizes from 9×9 to 13×13.

Citation

C. Gao, M. MÃ¼ller, R. Hayward. "Adversarial Policy Gradient for Alternating Markov Games". International Conference on Learning Representations, pp n/a, April 2018.

Keywords:
Category:	In Conference
Web Links:	OpenReview

BibTeX

@incollection{Gao+al:ICLR18,
  author = {Chao Gao and Martin MÃ¼ller and Ryan Hayward},
  title = {Adversarial Policy Gradient for Alternating Markov Games},
  Pages = {n/a},
  booktitle = {International Conference on Learning Representations},
  year = 2018,
}

Last Updated: June 29, 2020
Submitted by Sabina P

Not Logged In

PapersDB

Adversarial Policy Gradient for Alternating Markov Games

Citation

BibTeX