Not Logged In

Framework for Extreme Imbalance Classification: SWIM: Sampling With the Majority Class

The class imbalance problem is a pervasive issue in many realworld domains. Oversampling methods that inflate the rare class by generating synthetic data are amongst the most popular techniques for resolving class imbalance. However, they concentrate on the characteristics of the minority class and use them to guide the oversampling process. By completely overlooking the majority class, they lose a global view on the classification problem and, while alleviating the class imbalance, may negatively impact learnability by generating borderline or overlapping instances. This becomes even more critical when facing extreme class imbalance, where the minority class is strongly underrepresented and on its own does not contain enough information to conduct the oversampling process. We propose a framework for synthetic oversampling that, unlike existing resampling methods, is robust on cases of extreme imbalance. The key feature of the framework is that it uses the density of the well-sampled majority class to guide the generation process. We demonstrate implementations of the framework using the Mahalanobis distance and a radial basis function. We evaluate over 25 benchmark datasets, and show that the framework offers a distinct performance improvement over the existing state-of-the-art in oversampling techniques.

Citation

C. Bellinger, S. Sharma, N. Japkowicz, O. Zaiane. "Framework for Extreme Imbalance Classification: SWIM: Sampling With the Majority Class". Knowledge and Information Systems, 62(3), pp 841-866, May 2019.

Keywords: Machine learning, Imbalanced classification, Extreme imbalance, Synthetic oversampling, SMOTE
Category: In Journal
Web Links: Springer

BibTeX

@article{Bellinger+al:KAIS19,
  author = {Colin Bellinger and Shiven Sharma and Nathalie Japkowicz and Osmar
    R. Zaiane},
  title = {Framework for Extreme Imbalance Classification: SWIM: Sampling With
    the Majority Class},
  Volume = "62",
  Number = "3",
  Pages = {841-866},
  journal = {Knowledge and Information Systems},
  year = 2019,
}

Last Updated: September 15, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo