Estimating True and False Positive Rates in Higher Dimensional Problems and its Data Mining Applications
- Andrew Foss, University of Alberta
- Osmar R. Zaiane, University of Alberta (Database)
If we can estimate the accuracy of our observations then we can estimate the true and false positive rates over a series of samples in high dimensional data mining problems. To date such issues have been largely neglected and previously no algorithm has been provided to facilitate the computations involved.In high dimensional data mining tasks, increasing sparsity leads to decreasing true positive rates. Estimating this effect allows the estimation of the true size of membership of a class or cluster allowing us to identify the top candidates for these false negatives, while tracking the likelihood of false positives. These estimates of true and false positive rates can also help researchers avoid unnecessary costs by collecting only the number of samples that are really needed. We propose an algorithm for these computations designated the Statistical Error Rate Algorithm (SERA) and give an example of its use.
Citation
A. Foss, O. Zaiane. "Estimating True and False Positive Rates in Higher Dimensional Problems and its Data Mining Applications". Foundations of Data Mining Workshop, pp 673-681, December 2008.Keywords: | |
Category: | In Workshop |
Web Links: | IEEE |
BibTeX
@misc{Foss+Zaiane:08, author = {Andrew Foss and Osmar R. Zaiane}, title = {Estimating True and False Positive Rates in Higher Dimensional Problems and its Data Mining Applications}, Pages = {673-681}, booktitle = {Foundations of Data Mining Workshop}, year = 2008, }Last Updated: January 15, 2020
Submitted by Sabina P