Not Logged In

Reward Augmented Maximum Likelihood for Neural Structured Prediction

Full Text: nips16b.pdf PDF

A key problem in structured output prediction is enabling direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient method that incorporates task reward into maximum likelihood training. We establish a connection between maximum likelihood and regularized expected reward, showing that they are approximately equivalent in the vicinity of the optimal solution. Then we show how maximum likelihood can be generalized by optimizing the conditional probability of auxiliary outputs that are sampled proportional to their exponentiated scaled rewards. We apply this framework to optimize edit distance in the output space, by sampling from edited targets. Experiments on speech recognition and machine translation for neural sequence to sequence models show notable improvements over maximum likelihood baseline by simply sampling from target output augmentations.

Citation

M. Norouzi, S. Bengio, Z. Chen, N. Jaitly, M. Schuster, Y. Wu, D. Schuurmans. "Reward Augmented Maximum Likelihood for Neural Structured Prediction". Neural Information Processing Systems (NIPS), (ed: D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett), pp 1723-1731, December 2016.

Keywords:  
Category: In Conference
Web Links: NeurIPS

BibTeX

@incollection{Norouzi+al:NIPS16,
  author = {Mohammad Norouzi and Samy Bengio and Zhifeng Chen and Navdeep
    Jaitly and Mike Schuster and Yonghui Wu and Dale Schuurmans},
  title = {Reward Augmented Maximum Likelihood for Neural Structured
    Prediction},
  Editor = {D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, R. Garnett},
  Pages = {1723-1731},
  booktitle = {Neural Information Processing Systems (NIPS)},
  year = 2016,
}

Last Updated: February 14, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo