Not Logged In

Using Self-Supervised Word Segmentation in Chinese Information Retrieval

Full Text: peng01using.pdf PDF

We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with character based approaches, while overcoming many of their shortcomings. Experiments on TREC data show comparable performance to both the dictionary based and the character based approaches. However, our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi- lingual or cross-lingual information retrieval systems that are and adaptive.

Citation

F. Peng, X. Huang, D. Schuurmans, N. Cercone, S. Robertson. "Using Self-Supervised Word Segmentation in Chinese Information Retrieval". SIGIR, January 2002.

Keywords: chinese, retrieval, machine learning
Category: In Conference

BibTeX

@incollection{Peng+al:SIGIR02,
  author = {Fuchun Peng and Xiangji Huang and Dale Schuurmans and Nick Cercone
    and Stephen E. Robertson},
  title = {Using Self-Supervised Word Segmentation in Chinese Information
    Retrieval},
  booktitle = {},
  year = 2002,
}

Last Updated: June 01, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo