View Publication

An Efficient Reference-based Approach to Outlier Detection in Large Dataset

Yaling Pei
Osmar R. Zaiane, University of Alberta (Database)
Yong Gao

A bottleneck to detecting distance and density based outliers is that a nearest-neighbor search is required for each of the data points, resulting in a quadratic number of pairwise distance evaluations. In this paper, we propose a new method that uses the relative degree of density with respect to a fixed set of reference points to approximate the degree of density defined in terms of nearest neighbors of a data point. The running time of our algorithm based on this approximation is O(Rn log n) where n is the size of dataset and R is the number of reference points. Candidate outliers are ranked based on the outlier score assigned to each data point. Theoretical analysis and empirical studies show that our method is effective, efficient, and highly scalable to very large datasets.

Citation

Y. Pei, O. Zaiane, Y. Gao. "An Efficient Reference-based Approach to Outlier Detection in Large Dataset". IEEE International Conference on Data Mining (ICDM), pp 478-487, December 2006.

Keywords:
Category:	In Conference
Web Links:	IEEE

BibTeX

@incollection{Pei+al:ICDM06,
  author = {Yaling Pei and Osmar R. Zaiane and Yong Gao},
  title = {An Efficient Reference-based Approach to Outlier Detection in Large
    Dataset},
  Pages = {478-487},
  booktitle = {IEEE International Conference on Data Mining (ICDM)},
  year = 2006,
}

Last Updated: January 30, 2020
Submitted by Sabina P

Not Logged In

PapersDB

An Efficient Reference-based Approach to Outlier Detection in Large Dataset

Citation

BibTeX