Not Logged In

Text Document Categorization by Term Association

Full Text: icdm02-1.pdf PDF

A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and algorithms for automatic text categorization have been devised. According to published literature, some are more accurate than others, and some provide more interpretable classification models than others. However, none can combine all the beneficial properties enumerated above. In this paper we present a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field. We focus on two major problems: (1) finding the best term association rules in a textual database by generating and pruning; and (2) using the rules to build a text classifier. Our text categorization method proves to be efficient and effective, and experiments on well-known collections show that the classifier performs well. In addition, training as well as classification are both fast and the generated rules are human readable.

Citation

M. Antonie, O. Zaiane. "Text Document Categorization by Term Association". IEEE International Conference on Data Mining (ICDM), pp 19-26, December 2002.

Keywords:  
Category: In Conference
Web Links: IEEE

BibTeX

@incollection{Antonie+Zaiane:ICDM02,
  author = {Maria-Luiza Antonie and Osmar R. Zaiane},
  title = {Text Document Categorization by Term Association},
  Pages = {19-26},
  booktitle = {IEEE International Conference on Data Mining (ICDM)},
  year = 2002,
}

Last Updated: March 03, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo