View Publication

Self-Supervised Chinese Word Segmentation

Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
Dale Schuurmans, AICML

e propose a new unsupervised training method for acquir- ing probability models that accurately segment Chinese character se- quences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses succes- sive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.

Citation

F. Peng, D. Schuurmans. "Self-Supervised Chinese Word Segmentation". International Joint Conference on Artificial Intelligence (IJCAI), October 2001.

Keywords:	self-supervised
Category:	In Conference

BibTeX

@incollection{Peng+Schuurmans:IJCAI01,
  author = {Fuchun Peng and Dale Schuurmans},
  title = {Self-Supervised Chinese Word Segmentation},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2001,
}

Last Updated: August 13, 2007
Submitted by Russ Greiner

Not Logged In

PapersDB

Self-Supervised Chinese Word Segmentation

Citation

BibTeX