Not Logged In

A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits

Full Text: cai2018-evaluation-methodology.pdf PDF
Other Attachments: CAI_2018_slides.pptx [Slides] 

We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use data from any given Randomized Control Trial (RCT) to generate a range of observational studies with synthesized “outcome functions” that can match the user’s specified degrees of sample selection bias, which can then be used to comprehensively assess a given learning method. This is especially important in evaluating methods developed for precision medicine, where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of any treatment for any instance, we can accurately compute the quality of any proposed treatment policy. This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.

Citation

N. Hassanpour, R. Greiner. "A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits". Canadian Conference on Artificial Intelligence, (ed: Ebrahim Bagheri, Jackie Chi Kit Cheung), pp 31-44, May 2018.

Keywords: Contextual Bandits, Counterfactual Reasoning, Off-policy Learning, Evaluation Method, Randomized Control Trials
Category: In Conference
Web Links: DOI
  Springer

BibTeX

@incollection{Hassanpour+Greiner:CAIAC18,
  author = {Negar Hassanpour and Russ Greiner},
  title = {A Novel Evaluation Methodology for Assessing Off-Policy Learning
    Methods in Contextual Bandits},
  Editor = {Ebrahim Bagheri, Jackie Chi Kit Cheung},
  Pages = {31-44},
  booktitle = { Canadian Conference on Artificial Intelligence},
  year = 2018,
}

Last Updated: June 28, 2020
Submitted by Russ Greiner

University of Alberta Logo AICML Logo