Open Access Open Access  Restricted Access Subscription or Fee Access

A Review of Code Defect Likelihood Using ML Methods

UKPE KUFRE, AMANNAH *, Constance Izuchukwu



Software begets defects and defects are inevitable in software design and coding. This paper focused on five machine learning analysis models (MLAM); Support Vector Machine, Random Forest, K-Nearest Neighbor, Classification and Regression Tree, and Linear Discriminant Analysis. These models are trained to detect software defects using the software defect dataset from Promise data repository. The collected dataset is preprocessed to reduce the amount of redundant features using a dimension reduction algorithm called Principal Component Analysis. The transformed dataset is then used in training the five MLAM to predict software defect as a classification task. Real life software packages are analyzed and code attribute values such as line of code, cyclomatic complexity, line of comment, number of operators, number of operands, and so on are extracted for use in testing the trained models. To ascertain the efficiency of the system and also select the best algorithm for prediction of software, the performance of the trained models is evaluated; using the metrics; accuracy, error rate, precision, and recall. These models are ranked with respect to their performance and the best model (model with the highest accuracy) is selected and recommended for the task of software defect prediction.


Review, code defect, likelihood, method

Full Text:



  • There are currently no refbacks.