Using Self-Supervised Word Segmentation in Chinese Information Retrieval
- Fuchun Peng, Department of Computer Science, University of Massachusetts at Amherst
- Xiangji Huang, School of Computer Science, University of Waterloo
- Dale Schuurmans, AICML
- Nick Cercone, School of Computer Science, University of Waterloo
- Stephen E. Robertson, Microsoft Research Ltd., UK and City University, London, UK
We propose a self-supervised word-segmentation technique for Chinese information retrieval. This method combines the advantages of traditional dictionary based approaches with character based approaches, while overcoming many of their shortcomings. Experiments on TREC data show comparable performance to both the dictionary based and the character based approaches. However, our method is completely language independent and unsupervised, which provides a promising avenue for constructing accurate multi- lingual or cross-lingual information retrieval systems that are and adaptive.
Citation
F. Peng, X. Huang, D. Schuurmans, N. Cercone, S. Robertson. "Using Self-Supervised Word Segmentation in Chinese Information Retrieval". SIGIR, January 2002.Keywords: | chinese, retrieval, machine learning |
Category: | In Conference |
BibTeX
@incollection{Peng+al:SIGIR02, author = {Fuchun Peng and Xiangji Huang and Dale Schuurmans and Nick Cercone and Stephen E. Robertson}, title = {Using Self-Supervised Word Segmentation in Chinese Information Retrieval}, booktitle = {}, year = 2002, }Last Updated: June 01, 2007
Submitted by Staurt H. Johnson