View Publication

Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields

Shaojun Wang, Dept of Computing Science
Shaomin Wang, MIT
Li Cheng
Russ Greiner, Dept of Computing Science; PI of AICML
Dale Schuurmans, AICML

Full Text: Wang_et_al-2013-Computational_Intelligence.pdf

We present a directed Markov random field (MRF) model that combines n-gram models, probabilistic context-free grammars (PCFGs), and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context-sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified Expectation-Maximization (EM) method, the generalized insideâoutside algorithm, which extends the insideâoutside algorithm to incorporate the effects of the n-gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n-gram counts in cases where there are hidden variables. We also derive an analogous algorithm to find the most likely parse of a sentence and to calculate the probability of initial subsequence of a sentence, all generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state-of-the-art baseline trigram model with GoodâTuring and KneserâNey smoothing techniques.

Citation

S. Wang, S. Wang, L. Cheng, R. Greiner, D. Schuurmans. "Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields". Computational Intelligence, 29(4), pp 649â€“679, November 2013.

Keywords:	machine learning, language modeling, markov random fields
Category:	In Journal
Web Links:	Journal link
	DOI
Related Publication(s):	Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields

BibTeX

@article{Wang+al:ComputationalIntelligence13,
  author = {Shaojun Wang and Shaomin Wang and Li Cheng and Russ Greiner and
    Dale Schuurmans},
  title = {Exploiting Syntactic, Semantic and Lexical Regularities in Language
    Modeling via Directed Markov Random Fields},
  Volume = "29",
  Number = "4",
  Pages = {649â€“679},
  journal = {Computational Intelligence},
  year = 2013,
}

Last Updated: February 10, 2020
Submitted by Sabina P

Not Logged In

PapersDB

Exploiting Syntactic, Semantic and Lexical Regularities in Language Modeling via Directed Markov Random Fields

Citation

BibTeX