Actor-Critic Policy Optimization in Partially Observable Multiagent Environments
- Srinivasan Sriram, University of Alberta
- Marc Lanctot
- Vinicius Zambaldi
- Julien Perolat
- Karl Tuyls
- Remi Munos
- Michael Bowling, University of Alberta
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.
Citation
S. Sriram, M. Lanctot, V. Zambaldi, J. Perolat, K. Tuyls, R. Munos, M. Bowling. "Actor-Critic Policy Optimization in Partially Observable Multiagent Environments". Neural Information Processing Systems (NIPS), (ed: Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, Roman Garnett), pp 3426-3439, December 2018.Keywords: | |
Category: | In Conference |
Web Links: | NeurIPS |
BibTeX
@incollection{Sriram+al:NIPS18, author = {Srinivasan Sriram and Marc Lanctot and Vinicius Zambaldi and Julien Perolat and Karl Tuyls and Remi Munos and Michael Bowling}, title = {Actor-Critic Policy Optimization in Partially Observable Multiagent Environments}, Editor = {Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, Roman Garnett}, Pages = {3426-3439}, booktitle = {Neural Information Processing Systems (NIPS)}, year = 2018, }Last Updated: February 21, 2020
Submitted by Sabina P