Abstract
Abstract: Structured Query Language Injection Attack (SQLIA) is one of the most prevalent cyber attacks against web-based
application vulnerabilities; that are manipulated through injection techniques to gain access to restricted data, bypass authentication
mechanisms, and execute unauthorized data manipulation language. There are several solutions and approaches for identification and
prevention of SQLIA, such as Cryptography, Extensible Markup Language (XML), Pattern Matching, Parsing and Machine Learning.
Machine Learning (ML) approach has been found to be profound for SQLIA mitigation, which is implemented through defensive
coding approach. Machine Learning Approach requires a lot of data for efficient model training with capability for using several attack
patterns. ML approach can be used to mitigate a very hard blind SQL injection attack. An experimental analysis was performed in
Waikato Environment for Knowledge Analysis on Logistic Regression (LRN), Stochastic Gradient Descent (SDG), Sequential
Minimal Optimization (SMO), Bayes Network (BNK), Instance Based Learner (IBK), Multilayer Perceptron (MLP), Naive Bayes
(NBS), and J48. Hold-Out (70%) and 10-fold Cross Validation evaluation techniques were used to evaluate the performance of the
supervised learning classification algorithms to choose the best algorithm. The results of Cross Validation technique showed that
SMO, IBK and J48 had Accuracy of 99.982%, 99.995% and 99.999% respectively; while Hold-Out technique showed that SMO, IBK
and J48 had Accuracy of 99.986 %, 99.989 % and 100 respectively. On the other hand, in Cross Validation technique SMO, IBK and
J48 had time to build model value of 10.15sec, 0.06sec, and 14.12sec respectively while in Hold-Out technique SMO, IBK and J48
had time to build model value of 9.71sec, 0.16sec and 14.28sec respectively. From the findings, IBK had the minimum time to build
model in Cross Validation technique in addition to better performance in Accuracy, Sensitivity as well as Specificity and was chosen
as the classifier for SQLIA detection and prevention. Therefore, beyond Accuracy, other performance evaluation metrics are critical
for optimal algorithm selection for predictive analytics.