Self-Supervised Chinese Word Segmentation
- Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
- Dale Schuurmans, AICML
e propose a new unsupervised training method for acquir- ing probability models that accurately segment Chinese character se- quences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses succes- sive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.
Citation
F. Peng, D. Schuurmans. "Self-Supervised Chinese Word Segmentation". International Joint Conference on Artificial Intelligence (IJCAI), September 2001.Keywords: | self-supervised |
Category: | In Conference |
BibTeX
@incollection{Peng+Schuurmans:IJCAI01, author = {Fuchun Peng and Dale Schuurmans}, title = {Self-Supervised Chinese Word Segmentation}, booktitle = {International Joint Conference on Artificial Intelligence (IJCAI)}, year = 2001, }Last Updated: August 13, 2007
Submitted by Russ Greiner