Not Logged In

Language and Task Independent Text Categorization With Simple Language Models

Full Text: N03-1025.pdf PDF

We present a simple method for language independent and task independent text cat­ egorization learning, based on character­ level n­gram language models. Our ap­ proach uses simple information theoretic principles and achieves effective perfor­ mance across a variety of languages and tasks without requiring feature selection or extensive pre­processing. To demon­ strate the language and task independence of the proposed technique, we present ex­ perimental results on several languages--- Greek, English, Chinese and Japanese--- in several text categorization problems--- language identification, authorship attri­ bution, text genre classification, and topic detection. Our experimental results show that the simple approach achieves state of the art performance in each case.

Citation

F. Peng, D. Schuurmans, S. Wang. "Language and Task Independent Text Categorization With Simple Language Models". HLT-NAACL, May 2003.

Keywords: categorization, machine learning
Category: In Conference

BibTeX

@incollection{Peng+al:HLT-NAACL03,
  author = {Fuchun Peng and Dale Schuurmans and Shaojun Wang},
  title = {Language and Task Independent Text Categorization With Simple
    Language Models},
  booktitle = {},
  year = 2003,
}

Last Updated: June 01, 2007
Submitted by Staurt H. Johnson

University of Alberta Logo AICML Logo