Based on URL Feature Extraction Identify Malicious Website Using Machine Learning Techniques

Abstract

A phishing attack is the simplest way to obtain sensitive information from users. The aim of the phishers is to acquire critical information like username, password, bank account details and other personal information. With the development of Internet technology, network security is under different threats. Especially attackers can spread malicious uniform resource locators (URLs) to carry out attacks such as phishing and spam. The research on malicious URL detection is significant for defending against this attack. Some existing detection methods are easy to cover by attackers. We design a malicious URL detection model based on Machine Learning Techniques to solve these problems. Cyber security persons are now looking for reliable and stable detection techniques for phishing websites detection. This propose system deals with machine learning technology for the detection of phishing URLs by extracting and analysing various feature of legitimate and phishing URLs. Decision Trees, random forest and support vector machine algorithms are used to detect phishing websites or unsecure websites. The aim of the paper is to detect phishing URLs as well as cut down to the best machine learning algorithm by comparing the accuracy rate, false positive and false negative rate of each algorithm. This paper analyses the structural feature of the URL of the Phishing websites extracts 12 kinds of features and uses four machine learning algorithms for training and use the best-performing algorithm as our model to identify unknown URLs.

Country : India

1 Khushbu Digesh Vara2 Vaibhav Sudhir Dimble3 Mansi Mohan Yadav4 Aarti Ashok Thorat

  1. Navsahyadri Education Society’s Group of Institutions, Pune, India
  2. Navsahyadri Education Society’s Group of Institutions, Pune, India
  3. Navsahyadri Education Society’s Group of Institutions, Pune, India
  4. Navsahyadri Education Society’s Group of Institutions, Pune, India

IRJIET, Volume 6, Issue 3, March 2022 pp. 144-148

doi.org/10.47001/IRJIET/2022.603019

References

  1. Justin. Ma, Lawrence. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: learning to detect malicious websites from suspicious URLs,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2009, pp. 1245–1254.
  2. Mohammed Al-Janabi, Ed de Quincey, Peter Andras, “Using Supervised Machine Learning Algorithms to Detect suspicious URLs in online social networks” Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017.
  3. Pde las Cuevas, Z. Chelly, A. Mora, J. Merelo, and A. Esparcia Alcazar, “An improved decision system for URL accesses based on a rough feature selection technique,” in Recent Advances in Computational Intelligence in Defense and Security. Springer, 2016, pp. 139–167.
  4. A.Mora, P. De las Cuevas, and J. Merelo, “Going a step beyond the black and white lists for URL access in the enterprise by means of categorical classifiers,” ECTA, pp. 125–134, 2014.
  5. M.-Y. Kan and H. O. N. Thi, “Fast webpage classification using URL features,” in Proceedings of the 14th ACM international conference on information and knowledge management. ACM, 2005, pp. 325–326.
  6. E. Bayan, M. Henninger, L. Marian, and I. Weber, “Purely URL-based topic classification,” in Proceedings of the 18th international conference on World wide web. ACM, 2009, pp. 1109–1110.
  7. J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: learning to detect malicious web sites from suspicious URLs,” in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009, pp. 1245–1254.
  8. “Learning to detect malicious URLs,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 30, 2011.
  9. P. Zhao and S. C. Hoi, “Cost-sensitive online active learning with application to malicious URL detection,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013, pp. 919–927.
  10. Y. Zhang, J. I. Hong, and L. F. Cranor, “Cantina, a content-based approach to detecting phishing web sites” Proceedings of the 16th international conference on World Wide Web - WWW 07, pp. 639-648, 2007.