Not Logged In

Solving games with functional regret estimation

Full Text: 15aaai-rcfr.pdf PDF

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A noregret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in selfplay so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

Citation

K. Waugh, D. Morrill, J. Bagnell, M. Bowling. "Solving games with functional regret estimation". National Conference on Artificial Intelligence (AAAI), (ed: Blai Bonet, Sven Koenig), pp 2138-2145, January 2015.

Keywords:  
Category: In Conference
Web Links: AAAI

BibTeX

@incollection{Waugh+al:AAAI15,
  author = {Kevin Waugh and Dustin Morrill and J. Andrew Bagnell and Michael
    Bowling},
  title = {Solving games with functional regret estimation},
  Editor = {Blai Bonet, Sven Koenig},
  Pages = {2138-2145},
  booktitle = {National Conference on Artificial Intelligence (AAAI)},
  year = 2015,
}

Last Updated: October 29, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo