Knowledge discovery in software defect datasets using learning algorithms
In this paper, the learning impact on various classification models were studied which were built using binary class-imbalanced data. Before the learning process, some preprocessing techniques were applied to training datasets for removing the redundancy. Nowadays feature selection and sampling techniques become an essential tool for many data mining task because learning algorithms do not perform well with defective datasets, dimensionality reduction problem arises. Sampling technique is also used to reduce the harmful effects of imbalanced data on prediction models. Two experiments are introduced in this paper are: (1) training on original data with selected features, includes AdaBoost and SVM (support vector machines) as classifiers (2) training on balanced data with chosen features, includes AdaBoost and random forest as the classifier. The classification models were compared over two different schemas. The results demonstrate that the classification models over the selected feature in the balanced format are outperforming the classification model built without balancing (classification over imbalanced data).
Keywords: SDP, SMOTE, AdaBoost, SVM, RF
Cite this Article
Meetesh Nevendra, Pradeep Singh. Knowledge Discovery in Software Defect Datasets Using Learning Algorithms. Journal of Software Engineering Tools & Technology Trends. 2018; 5(2): 18–26p.
- There are currently no refbacks.
This site has been shifted to https://stmcomputers.stmjournals.com/