View Publication

Using Self-Supervised Word Segmentation in Chinese Information Retrieval

Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
Xiangji Huang, School of Computer Science, University of Waterloo
Dale Schuurmans, AICML
Nick Cercone, School of Computer Science, University of Waterloo
Stephen E. Robertson, Microsoft Research Ltd., UK and City University, London, UK

We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with character based approaches, while overcoming many of their shortcomings. Experiments on TREC data show comparable performance to both the dictionary based and the character based approaches. However, our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi- lingual or cross-lingual information retrieval systems that are and adaptive.

Citation

F. Peng, X. Huang, D. Schuurmans, N. Cercone, S. Robertson. "Using Self-Supervised Word Segmentation in Chinese Information Retrieval". SIGIR, January 2002.

Keywords:	chinese, retrieval, machine learning
Category:	In Conference

BibTeX

@incollection{Peng+al:SIGIR02,
  author = {Fuchun Peng and Xiangji Huang and Dale Schuurmans and Nick Cercone
    and Stephen E. Robertson},
  title = {Using Self-Supervised Word Segmentation in Chinese Information
    Retrieval},
  booktitle = {},
  year = 2002,
}

Last Updated: June 01, 2007
Submitted by Staurt H. Johnson

Not Logged In

PapersDB

Using Self-Supervised Word Segmentation in Chinese Information Retrieval

Citation

BibTeX