Solving games with functional regret estimation
- Kevin Waugh
- Dustin Morrill
- J. Andrew Bagnell
- Michael Bowling, University of Alberta
We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A noregret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in selfplay so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.
Citation
K. Waugh, D. Morrill, J. Bagnell, M. Bowling. "Solving games with functional regret estimation". National Conference on Artificial Intelligence (AAAI), (ed: Blai Bonet, Sven Koenig), pp 2138-2145, January 2015.Keywords: | |
Category: | In Conference |
Web Links: | AAAI |
BibTeX
@incollection{Waugh+al:AAAI15, author = {Kevin Waugh and Dustin Morrill and J. Andrew Bagnell and Michael Bowling}, title = {Solving games with functional regret estimation}, Editor = {Blai Bonet, Sven Koenig}, Pages = {2138-2145}, booktitle = {National Conference on Artificial Intelligence (AAAI)}, year = 2015, }Last Updated: October 29, 2020
Submitted by Sabina P