Home > Issue 1 > Class Imbalance Learning

Class Imbalance Learning


[1] M. Kubat, R. C. Holte, and S. Matwin, “Machine Learning for the Detection of Oil Spills in Satellite Radar Images.,” Machine Learning, vol. 30, no. 2-3, pp. 195–215, 1998.

[2] G. M. Weiss,“Mining with Rare Cases.,”in The Data Mining and Knowledge Discovery Handbook (O. Maimon and L. Rokach, eds.), pp. 765–776, Springer, 2005.

[3] G. M. Weiss,“Mining with rarity: a unifying framework.,”SIGKDD Explorations, vol. 6, no. 1, pp. 7–19, 2004.

[4] R. Prati, G. Batista, and M. Monard, “Class imbalances versus class overlapping: an analysis of a learning system behavior,” MICAI 2004: Advances in Artificial Intelligence, pp. 312–321, 2004.

[5] T. Jo and N. Japkowicz, “Class imbalances versus small disjuncts.,” SIGKDD Explorations, vol. 6, no. 1, pp. 40–49, 2004.

[6] R. C. Prati, G. E. A. P. A. Batista, and M. C. Monard, “Learning with Class Skews and Small Disjuncts.,” in SBIA
(A. L. C. Bazzan and S. Labidi, eds.), vol. 3171 of Lecture Notes in Computer Science, pp. 296–306, Springer, 2004.

[7] J. R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.

[8] W.-H. Yang, D.-Q. Dai, and H. Yan, “Feature Extraction and Uncorrelated Discriminant Analysis for HighDimensional Data.,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 5, pp. 601–614, 2008.

[9] H. He and E. Garcia,“Learning from Imbalanced Data,”Knowledge and Data Engineering, IEEE Transactions on, vol. 21, pp. 1263–1284, Sept 2009.

[10] H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications. Wiley-IEEE Press, 1st ed., 2013.

[11] P. Branco, L. Torgo, and R. P. Ribeiro, “A Survey of Predictive Modeling on Imbalanced Domains.,” ACM Comput. Surv., vol. 49, no. 2, pp. 31:1–31:50, 2016.

[12] D. Mease, A. Wyner, and a. Buja,“Boosted classification trees and class probability/quantile estimation,”The Journal of Machine Learning Research, vol. 8, pp. 409–439, 2007.

[13] C. Drummond and R. Holte, “C4.5, class imbalance, and cost sensitivity: why under-sampling beats oversampling,” Workshop on Learning from Imbalanced Datasets II, pp. 1–8, 2003.

[14] R. C. Holte, L. Acker, and B. W. Porter, “Concept Learning and the Problem of Small Disjuncts.,” in IJCAI (N. S. Sridharan, ed.), pp. 813–818, Morgan Kaufmann, 1989.

[15] X.-Y. Liu and Z.-H. Zhou,“The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study.,” in ICDM, pp. 970–974, IEEE Computer Society, 2006.

[16] J. Zhang and I. Mani,“KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction,” in Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Datasets, 2003.

[17] M. Kubat and S. Matwin, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” in In Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186, Morgan Kaufmann, 1997.

[18] N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer,“SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

[19] B. X. Wang and N. Japkowicz, “Imbalanced Data Set Learning with Synthetic Samples,” 2004.

[20] H. Han, W. Wang, and B. Mao,“Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning.,” in ICIC (1) (D.-S. Huang, X.-P. Zhang, and G.-B. Huang, eds.), vol. 3644 of Lecture Notes in Computer Science, pp. 878–887, Springer, 2005.

[21] S. Barua, M. M. Islam, X. Yao, and K. Murase, “MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning.,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 2, pp. 405–425, 2014.

[22] A. Agrawal, H. L. Viktor, and E. Paquet, “SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling.,” in KDIR (A. L. N. Fred, J. L. G. Dietz, D. Aveiro, K. Liu, and J. Filipe, eds.), pp. 226–234, SciTePress, 2015.

[23] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning.,” in IJCNN, pp. 1322–1328, IEEE, 2008.

[24] S. Barua, M. M. Islam, and K. Murase, “ProWSyn: Proximity Weighted Synthetic Oversampling Technique for Imbalanced Data Set Learning.,”in PAKDD (2) (J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, eds.), vol. 7819 of Lecture Notes in Computer Science, pp. 317–328, Springer, 2013.

[25] Y. Dong and X. Wang, “A New Over-Sampling Approach: Random-SMOTE for Learning from Imbalanced Data Sets.,” in KSEM (H. Xiong and W. B. Lee, eds.), vol. 7091 of Lecture Notes in Computer Science, pp. 343–352, Springer, 2011.

[26] I. Tomek, “Two Modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 7(2), pp. 679–772, 1976.

[27] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data,”ACM SIGKDD Explorations Newsletter – Special issue on learning from imbalanced datasets, vol. 6, no. 1, pp. 20–29, 2004.

[28] J. Laurikkala, “Improving Identification of Difficult Small Classes by Balancing Class Distribution.,” in AIME (S. Quaglini, P. Barahona, and S. Andreassen, eds.), vol. 2101 of Lecture Notes in Computer Science, pp. 63–66, Springer, 2001.

[29] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, “SMOTEBoost: Improving Prediction of the Minority Class in Boosting.,” in PKDD (N. Lavrac, D. Gamberger, H. Blockeel, and L. Todorovski, eds.), vol. 2838 of Lecture Notes in Computer Science, pp. 107–119, Springer, 2003.

[30] H. Guo and H. L. Viktor, “Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach.,” SIGKDD Explorations, vol. 6, no. 1, pp. 30–39, 2004.

[31] H. Guo and H. L. Viktor, “Boosting with Data Generation: Improving the Classification of Hard to Learn Examples.,” in IEA/AIE (R. Orchard, C. Yang, and M. Ali, eds.), vol. 3029 of Lecture Notes in Computer Science, pp. 1082–1091, Springer, 2004.

[32] S. Chen, H. He, and E. A. Garcia, “RAMOBoost: Ranked Minority Oversampling in Boosting.,” IEEE Trans Neural Netw, vol. 21, pp. 1624–42, Oct. 2010.

[33] C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano,“RUSBoost: A Hybrid Approach to Alleviating Class Imbalance.,” IEEE Trans. Systems, Man, and Cybernetics, Part A, vol. 40, no. 1, pp. 185–197, 2010.

[34] X. Zhang and B.-G. Hu, “A New Strategy of Cost-Free Learning in the Class Imbalance Problem.,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 12, pp. 2872–2885, 2014.

[35] Y. Peng, “Adaptive Sampling with Optimal Cost for Class-Imbalance Learning.,” in AAAI (B. Bonet and S. Koenig, eds.), pp. 2921–2927, AAAI Press, 2015.

[36] W. Zong, G.-B. Huang, and Y. Chen, “Weighted extreme learning machine for imbalance learning.,” Neurocomputing, vol. 101, pp. 229–242, 2013.

[37] X. Gao, Z. Chen, S. Tang, Y. Zhang, and J. Li, “Adaptive weighted imbalance learning with application to abnormal activity recognition.,” Neurocomputing, vol. 173, pp. 1927–1935, 2016.

[38] I. Nekooeimehr and S. K. Lai-Yuen, “Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets.,” Expert Syst. Appl., vol. 46, pp. 405–416, 2016.

[39] M. A. Tahir, J. Kittler, and F. Yan, “Inverse random under sampling for class imbalance problem and its application to multi-label classification.,” Pattern Recognition, vol. 45, no. 10, pp. 3738–3750, 2012.

[40] C. Elkan, “The Foundations of Cost-Sensitive Learning,” in IJCAI, pp. 973–978, 2001.

[41] N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on learning from imbalanced data sets.,” SIGKDD Explorations, vol. 6, no. 1, pp. 1–6, 2004.

[42] B. Zadrozny, J. Langford, and N. Abe, “Cost-Sensitive Learning by Cost-Proportionate Example Weighting.,” in ICDM, pp. 435–, IEEE Computer Society, 2003.

[43] P. M. Domingos,“MetaCost: A General Method for Making Classifiers Cost-Sensitive.,”in KDD (U. M. Fayyad, S. Chaudhuri, and D. Madigan, eds.), pp. 155–164, ACM, 1999.

[44] Y. Freund and R. E. Schapire,“Experiments with a New Boosting Algorithm,”in International Conference on Machine Learning, pp. 148–156, 1996.

[45] Y. Sun, M. S. Kamel, A. K. C. Wong, and Y. W. 0007,“Cost-sensitive boosting for classification of imbalanced data.,” Pattern Recognition, vol. 40, no. 12, pp. 3358–3378, 2007.

[46] W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan,“AdaCost: Misclassification Cost-Sensitive Boosting.,”in ICML (I. Bratko and S. Dzeroski, eds.), pp. 97–105, Morgan Kaufmann, 1999.

[47] M. a. Maloof,“Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown,”Analysis, vol. 21, pp. 1263–1284, 2003.

[48] K. M. Ting,“An Instance-Weighting Method to Induce Cost-Sensitive Trees.,”IEEE Trans. Knowl. Data Eng., vol. 14, no. 3, pp. 659–665, 2002.

[49] C. Drummond and R. C. Holte, “Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria.,” in ICML (P. Langley, ed.), pp. 239–246, Morgan Kaufmann, 2000.

[50] M. Kukar and I. Kononenko, “Cost-Sensitive Learning with Neural Networks.,” in ECAI, pp. 445–449, 1998.

[51] B. Krawczyk and M. Wozniak,“Cost-Sensitive Neural Network with ROC-Based Moving Threshold for Imbalanced Classification.,” in IDEAL (K. Jackowski, R. Burduk, K. Walkowiak, M. Wozniak, and H. Yin, eds.), vol. 9375 of Lecture Notes in Computer Science, pp. 45–52, Springer, 2015.

[52] R. Akbani, S. Kwek, and N. Japkowicz, “Applying Support Vector Machines to Imbalanced Datasets.,” in ECML (J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, eds.), vol. 3201 of Lecture Notes in Computer Science, pp. 39–50, Springer, 2004.

[53] F. Vilarin˜o, P. Spyridonos, J. Vitri`a, and P. Radeva,“Experiments with SVM and Stratified Sampling with an Imbalanced Problem: Detection of Intestinal Contractions.,” in ICAPR (2) (S. Singh, M. Singh, C. Apt´e, and P. Perner, eds.), vol. 3687 of Lecture Notes in Computer Science, pp. 783–791, Springer, 2005.

[54] P. Kang and S. Cho, “EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems.,” in ICONIP (1) (I. King, J. Wang, L. Chan, and D. L. Wang, eds.), vol. 4232 of Lecture Notes in Computer Science, pp. 837–846, Springer, 2006.

[55] Y. Liu, A. An, and X. Huang,“Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles.,” in PAKDD (W. K. Ng, M. Kitsuregawa, J. Li, and K. Chang, eds.), vol. 3918 of Lecture Notes in Computer Science, pp. 107–118, Springer, 2006.

[56] B. X. Wang and N. Japkowicz, “Boosting support vector machines for imbalanced data sets.,” Knowl. Inf. Syst., vol. 25, no. 1, pp. 1–20, 2010.

[57] Y. Tang and Y.-Q. Zhang, “Granular SVM with Repetitive Undersampling for Highly Imbalanced Protein Homology Prediction.,” in GrC, pp. 457–460, IEEE, 2006.

[58] G. Wu and E. Y. Chang,“Aligning Boundary in Kernel Space for Learning Imbalanced Dataset,”in Proceedings of the Fourth IEEE International Conference on Data Mining, ICDM ’04, (Washington, DC, USA), pp. 265– 272, IEEE Computer Society, 2004.

[59] X. Hong, S. C. 0001, and C. J. Harris,“A Kernel-Based Two-Class Classifier for Imbalanced Data Sets.,”IEEE Transactions on Neural Networks, vol. 18, no. 1, pp. 28–41, 2007.

[60] Y. Xu, Y. Zhang, Z. Yang, X. Pan, and G. Li, “Imbalanced and semi-supervised classification for prognosis of ACLF.,” Journal of Intelligent and Fuzzy Systems, vol. 28, no. 2, pp. 737–745, 2015.

[61] M. Wu and J. Ye, “A Small Sphere and Large Margin Approach for Novelty Detection Using Training Data with Outliers.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 2088–2092, 2009.

[62] Jayadeva, R. Khemchandani, and S. Chandra, “Twin Support Vector Machines for Pattern Classification.,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 5, pp. 905–910, 2007.

[63] F. Li, C. Yu, N. Yang, F. Xia, G. Li, and F. Kaveh-Yazdy, “Iterative Nearest Neighborhood Oversampling in Semisupervised Learning from Imbalanced Data,” The Scientific World Journal, Volume 2013, Article ID 875450, 2013, Dec. 2013.

[64] M. Lichman, “UCI Machine Learning Repository,” 2013.

[65] Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010.

[66] A. E. Ghoul and H. Sahbi, “Semi-supervised learning using a graph-based phase field model for imbalanced data set classification.,” in ICASSP, pp. 2942–2946, IEEE, 2014.

[67] M. Rochery, I. Jermyn, and J. Zerubia, “Phase Field Models and Higher-Order Active Contours.,” in ICCV, pp. 970–976, IEEE Computer Society, 2005.

[68] A. Stanescu and D. Caragea, “Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets.,” in BIBM (H. J. Zheng, W. Dubitzky, X. Hu, J.-K. Hao, D. P. Berrar, K.-H. Cho, Y. Wang, and D. R. Gilbert, eds.), pp. 432–437, IEEE Computer Society, 2014.

[69] J. Tanha, M. van Someren, and H. Afsarmanesh, “Semi-supervised self-training for decision tree classifiers,” International Journal of Machine Learning and Cybernetics, 2015.

[70] J. Xie and T. Xiong, “Stochastic Semi-supervised Learning.,” in Active Learning and Experimental Design @ AISTATS (I. Guyon, G. C. Cawley, G. Dror, V. Lemaire, and A. R. Statnikov, eds.), vol. 16 of JMLR Proceedings, pp. 85–98, JMLR.org, 2011.

[71] B. A. Almogahed and I. A. Kakadiaris, “Empowering Imbalanced Data in Supervised Learning: A Semisupervised Learning Approach.,” in ICANN (S. Wermter, C. Weber, W. Duch, T. Honkela, P. D. KoprinkovaHristova, S. Magg, G. Palm, and A. E. P. Villa, eds.), vol. 8681 of Lecture Notes in Computer Science, pp. 523–530, Springer, 2014.

[72] S. Li, Z. Wang, G. Zhou, and S. Y. M. Lee, “Semi-Supervised Learning for Imbalanced Sentiment Classification.,” in IJCAI (T. Walsh, ed.), pp. 1826–1831, IJCAI/AAAI, 2011.

[73] A. Estabrooks, T. Jo, and N. Japkowicz,“A Multiple Resampling Method for Learning from Imbalanced Data Sets.,” Computational Intelligence, vol. 20, no. 1, pp. 18–36, 2004.

[74] F. J. Provost and G. M. Weiss,“Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction,” CoRR, vol. abs/1106.4557, pp. 315–354, 2011.

[75] X. Zhu, “Semi–Supervised Learning in Literature Survey,” Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

Pages ( 8 of 8 ): « Previous1 ... 67 8

One thought on “Class Imbalance Learning

  1. I had the good fortune of reading your article. It was well-written sir and contained sound, practical advice. You pointed out several things that I will remember for years to come. Thank you Sir. As a laymen outside from the ML industry I can understand those practical examples, CIL, Labelled datasets,Semi-supervised classification algorithms, Supervised classification algorithms,training data, unlabeled or labeled data sets etc. Thank you inspiration…appreciates it. Vazhutthukkal 🙂

Leave a Comment:

Your email address will not be published. Required fields are marked *