Ensemble Approaches for Class Imbalance Problem: A Review
Published: 2019
Author(s) Name: Anjana Gosain and Arushi Gupta |
Author(s) Affiliation: Department of Information Technology, USICT, GGSIP University, Dwarka, Delhi, India.
Locked
Subscribed
Available for All
Abstract
In data mining, performing classification for skewed data distribution is a challenging problem. Traditional Classification Techniques (TCT) work efficiently in classifying data having symmetric distribution, as their internal design favors the balanced datasets. The Class Imbalance Problem (CIP) take place when number of instances of one class outnumbers instances of other classes. Some factors that contribute towards this imbalancing are noisy data, borderline samples, degree of class overlapping, small disjuncts, etc. In machine learning, ensembles are basically built to improve the performance and correctness of single classifier by training multiple classifiers to form the results that output the correct single class label. In this paper, our aim is to review ensemble learning methods having two-class problem. We propose different levels for ensemble learning methods that are at data level, at algorithm level and according to the base classifier.
Keywords: Bagging, Boosting, Classification, Class imbalance problem, Oversampling, Skewed data distribution, Undersampling.
View PDF