Not Logged In

Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields

Full Text: SW-language-ICML2005.ps PS

We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context free grammars (PCFGs) and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified EM method, the generalized inside-outside algorithm, which extends the inside-outside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n-gram counts in cases where there are hidden variables. We also derive an analogous algorithms to find the most likely parse of a sentence and to calculate the probability of initial subsequence of a sentence, all generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state-of-the-art baseline trigram model with Good-Turing and Kneser-Ney smoothing techniques.

Citation

S. Wang, S. Wang, R. Greiner, D. Schuurmans, L. Cheng. "Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields". International Conference on Machine Learning (ICML), Bonn, Germany, pp 953-960, August 2005.

Keywords: language modeling, random field, PLSA, machine learning
Category: In Conference

BibTeX

@incollection{Wang+al:ICML05,
  author = {Shaojun Wang and Shaomin Wang and Russ Greiner and Dale Schuurmans
    and Li Cheng},
  title = {Exploiting Syntactic, Semantic and Lexical Regularities in Language
    Modeling via Directed Markov Random Fields},
  Pages = {953-960},
  booktitle = {International Conference on Machine Learning (ICML)},
  year = 2005,
}

Last Updated: June 05, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo