Not Logged In

Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining

Full Text: kdd03.pdf PDF

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.

Citation

M. El-Hajj, O. Zaiane. "Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining". International Conference on Knowledge Discovery and Data Mining, Washington, USA, (ed: Lise Getoor, Ted E. Senator, Pedro M. Domingos, Christos Faloutsos), pp 109-118, August 2003.

Keywords: Association Rules, Frequent Patterns Mining, COFI-tree, Inverted Matrix
Category: In Conference
Web Links: ACM Digital Library

BibTeX

@incollection{El-Hajj+Zaiane:ACMSIGKDD03,
  author = {Mohammad El-Hajj and Osmar R. Zaiane},
  title = {Inverted Matrix: Efficient Discovery of Frequent Items in Large
    Datasets in the Context of Interactive Mining},
  Editor = {Lise Getoor, Ted E. Senator, Pedro M. Domingos, Christos Faloutsos},
  Pages = {109-118},
  booktitle = {International Conference on Knowledge Discovery and Data Mining},
  year = 2003,
}

Last Updated: September 10, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo