Not Logged In

A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits [wkshp]

Full Text: nips2017-evaluation-methodology.pdf PDF
Other Attachments: NIPS_workshop-poster-final.pdf [Poster] PDF

We propose a novel evaluation methodology for assessing off-policy learning methods in contextual bandits. In particular, we provide a way to use {any given Randomized Control Trial (RCT)} to generate a range of {observational studies (with synthesized ``outcome functions'')} that can match the user's specified degrees of sample selection bias, which can then be used to comprehensively assess a given learning method. This is especially important in developing methods for precision medicine where deploying a bad policy can have devastating effects. As the outcome function specifies the real-valued quality of {em any} treatment for any instance, we can accurately compute the quality of any proposed treatment policy. This paper uses this evaluation methodology to establish a common ground for comparing the robustness and performance of the available off-policy learning methods in the literature.

Citation

N. Hassanpour, R. Greiner. "A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits [wkshp]". NIPS 2017 Workshop on Causal Inference and Machine Learning (WhatIF2017), November 2017.

Keywords: Contextual bandit, RCT, machine learning, precision health
Category: In Workshop

BibTeX

@misc{Hassanpour+Greiner:WhatIf201717,
  author = {Negar Hassanpour and Russ Greiner},
  title = {A Novel Evaluation Methodology for Assessing  Off-Policy Learning
    Methods in Contextual Bandits [wkshp]},
  booktitle = {NIPS 2017 Workshop on Causal Inference and Machine Learning
    (WhatIF2017)},
  year = 2017,
}

Last Updated: February 11, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo