Self-Supervised Chinese Word Segmentation
- Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
- Dale Schuurmans, AICML

e propose a new unsupervised training method for acquir- ing probability models that accurately segment Chinese character se- quences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses succes- sive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.
Citation
F. Peng, D. Schuurmans. "Self-Supervised Chinese Word Segmentation". International Joint Conference on Artificial Intelligence (IJCAI), October 2001.| Keywords: | self-supervised | 
| Category: | In Conference | 
BibTeX
@incollection{Peng+Schuurmans:IJCAI01,
  author = {Fuchun Peng and Dale Schuurmans},
  title = {Self-Supervised Chinese Word Segmentation},
  booktitle = {International Joint Conference on Artificial Intelligence
    (IJCAI)},
  year = 2001,
}Last Updated: August 13, 2007Submitted by Russ Greiner
 
        