A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits [wkshp]
Full Text:
nips2017-evaluation-methodology.pdf
We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use {any given Randomized Control Trial (RCT)} to generate a range of {observational studies (with synthesized ``outcome functions'')} that can match the user's specified degrees of sample selection bias,
which can then be used to comprehensively assess a given learning method. This is especially important in developing methods for precision medicine where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of
{em any} treatment for any instance, we can accurately compute the quality of any proposed treatment policy.
This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.
Citation
N. Hassanpour,
R. Greiner.
"A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits [wkshp]". NIPS 2017 Workshop on Causal Inference and Machine Learning (WhatIF2017), November 2017.
Keywords: |
Contextual bandit, RCT, machine learning, precision health |
Category: |
In Workshop |
BibTeX
@misc{Hassanpour+Greiner:WhatIf201717,
author = {Negar Hassanpour and Russ Greiner},
title = {A Novel Evaluation Methodology for Assessing Off-Policy Learning
Methods in Contextual Bandits [wkshp]},
booktitle = {NIPS 2017 Workshop on Causal Inference and Machine Learning
(WhatIF2017)},
year = 2017,
}
Last Updated: February 11, 2020
Submitted by Sabina P