ISSN:2582-5208

www.irjmets.com

Paper Key : IRJ************822
Author: Samaila Kasimu Ahmad,Babagana Ali Dapshima,Yasmin Chuupa Essa
Date Published: 08 Jul 2024
Abstract
The study evaluated several machine learning techniques for detecting phishing attacks, including Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Two datasets were used - one from PhishTank and another from the UCI machine learning repository. Results showed that the Random Forest model achieved the highest accuracy across multiple metrics. On the PhishTank dataset, RF had the best K-fold cross-validation accuracy at 99.55%, feature selection accuracy at 99.00%, and hyperparameter tuning accuracy at 99.45%. The XGBoost model performed well too, with 99.16% K-fold accuracy on PhishTank. On the UCI dataset, XGBoost had the highest K-fold accuracy at 97.16%, while RF still demonstrated maximum accuracy for feature selection and hyperparameter tuning. Logistic Regression consistently showed the lowest accuracy across datasets and metrics. The proposed approach was validated against other researcherswork on PhishTank, achieving 98.80% accuracy, which compared favorably. ROC curves further illustrated the strong performance, especially for the top-performing models. The study demonstrated that using selected features and hyperparameter tuning could enhance detection accuracy. The machine learning algorithms, particularly Random Forest, outperformed other state-of-the-art techniques in accurately identifying phishing attacks. The high accuracy metrics indicate the proposed framework's effectiveness in detecting phishing attempts.Keywords: Detection, Phishing Attack, Phish Tank, Hyper-Parameter, Machine Learning.
DOI Requested
Paper File to download :