×

An under-sampling method based on clustering for imbalanced data set. (Chinese. English summary) Zbl 1438.68092

Summary: In order to solve the classification problem of imbalanced data, we propose an under-sampling method based on clustering method. Taking different number of clusters, the majority samples in the training set are clustered for several times. Then the cluster centers are used to represent the majority class. Next, the cluster centers are combined with the minority samples into a number of new training sets. Then the training sets are used to train classifiers and eliminate the classifiers with false classification tendency. Finally, we vote on the results of the classification. In experiments, a lot of simulations on 16 imbalanced data sets are conducted and the proposed algorithm is compared with some other under-sampling algorithms. The theoretical analysis and experimental results show that the algorithm can improve the classification performance of imbalance data sets effectively.

MSC:

68T05 Learning and adaptive systems in artificial intelligence