A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits
Full Text:
cai2018-evaluation-methodology.pdf
We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use data from any given Randomized Control Trial (RCT) to generate a range of observational studies with synthesized “outcome functions” that can match the user’s specified degrees of sample selection bias, which can then be used to comprehensively assess a given learning method. This is especially important in evaluating methods developed for precision medicine, where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of any treatment for any instance, we can accurately compute the quality of any proposed treatment policy. This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.
Citation
N. Hassanpour,
R. Greiner.
"A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits".
Canadian Conference on Artificial Intelligence, (ed: Ebrahim Bagheri, Jackie Chi Kit Cheung), pp 31-44, May 2018.
Keywords: |
Contextual Bandits, Counterfactual Reasoning, Off-policy Learning, Evaluation Method, Randomized Control Trials |
Category: |
In Conference |
Web Links: |
DOI |
|
Springer |
BibTeX
@incollection{Hassanpour+Greiner:CAIAC18,
author = {Negar Hassanpour and Russ Greiner},
title = {A Novel Evaluation Methodology for Assessing Off-Policy Learning
Methods in Contextual Bandits},
Editor = {Ebrahim Bagheri, Jackie Chi Kit Cheung},
Pages = {31-44},
booktitle = { Canadian Conference on Artificial Intelligence},
year = 2018,
}
Last Updated: June 28, 2020
Submitted by Russ Greiner