Literature Survey

With the advent of Phishing webpages, researchers have investigated supervised and unsupervised learning models for detecting phishing webpages for instance, Moghimi, Mahmood, and Ali Yazdani Varjani [14]. support vector machine (SVM) algorithm to classify webpages. their experiments indicate that the proposed model can detect phishing pages in internet banking with accuracy of 99.14% true positive and only 0.86% false negative alarm Afroz and Green Stadt [15] developed Phish Zoo technique this technique constructs a website profile using a fuzzy hashing approach in which the website is represented by several criteria that differentiate one website from another including images, HTML source code, URL, and SSL certificate. A. Desai, J. Jatakia, R. Naik, and N. Raul [16] created an extension to Google Chrome to detect phishing websites content with the help of machine learning algorithms,S. Parekh, D. Parikh, S. Kotak, and P. S. Sankhe [17] proposed a model with answer for recognizing phishing sites by utilizing URL identification strategy utilizing Random Forest algorithm, X. Zhang, Y. Zeng, X. Jin, Z. Yan, and G. Geng [18] proposed a phishing detection model to detect the phishing performance effectively by using mining the semantic features of word embedding, semantic feature and multi-scale statistical features in Chinese web pages, y Ma et al. [19], Zhang et al wrote Python scripts to automatically download confirmed phishing websites‟ URLs from PhishTank. PhishTank is a collaborative clearing house for data and information about phishing on the Internet Jeeva and Raj Singh [20] extracted features related to transport layer security together with URL based features such as length, number of slashes, number and positions of dots in URL and subdomain names. Rule mining was used to establish detection rules using the apriorist algorithm on the extracted features. Experimental results showed that 93% of phishing URLs were detected. Jain and Gupta [21] presents an anti- phishing approach, which uses machine learning by extracting 19 features in the client side to distinguish phishing websites from legitimate ones, Peng, Harris, and Sawa [22], NLP is applied to detect phishing emails. It performs a semantic analysis of the content of emails (as simple text) to detect malicious intent. Prakash, Kumar, Kompella, & Gupta, 2010 [23], These systems use an approximate matching algorithm to check whether the suspicious URL exists in the blacklist or not S. Aonzo, A. Merlo, G. Tavella, and Y. Frat Antonio, [24] represented the Multifactor Authentication technique uses two or more authentications to login into the accounts/systems. One is password and other is code generated by an app through SMS, phone calls or emails. By this method only authenticated person can login into his accounts Tech5(Machine Learning Approach, 60%) was identified as one of the most effective anti-phishing techniques, one of the early developed whitelist was proposed by Chen and Guo [25], which was based on users’ browsing trusted websites. The whitelist monitors the user’s login attempts and if a repeated login was successfully executed this method prompts the user to insert that website into the whitelist. One clear limitation of Chen and Guo’s method is that it assumes that users are dealing with trustful websites, which unfortunately is not always the case. Zhang H, Liu G, Chow TWS, Liu W [26] presented a new framework for content-based phishing detection using a Bayesian approach. Selection Lee and Kim [27] proposed a suspicious URL detection system called WARNINGBIRD for Twitter. Li et al. [28] proposed a combination of linear/nonlinear domain conversion methods to represent the core problem more clearly and to improve the performance of classifiers in identifying malicious URLs Yang L, Zhang J, Wang X, Li Z, Li Z [29] presented a new approach to phishing detection based on an inverted matrix online sequential over-learning machine that takes into account three types of features to characterize a website. They used the Sherman Morrison Woodbury equation to reduce matrix inversion. They introduced the online queue extreme learning machine to update the training model. De La Torre Parra et al. [30] proposed a cloud-based distributed deep learning framework for phishing attack detection. Wu, et al [36] empirically investigated three simulated anti-phishing toolbars to determine how they were effective at securing participants from visiting fraudulent websites Bait Alarm [37] is comparatively more efficient as VSBPD compares the text and their style in two websites, Visual Similarity Based Phishing Detection (VSBPD) [38] gives a warning to the user whenever he tries gives his credentials to an untrusted website Google Safe Browsing API [39] allows the client side applications to check if a URL is blacklisted from a list which is continuously updated by Google, Juan Chen, and Chuanxiong Guo designed and developed Link Guard algorithm [40] to detect Spoofed hyperlinks in the phishing mails
Methodology:
There are numerous methods that have been used in the past to duplicate various websites, such as Facebook, Instagram, GitHub, etc., but all of these methods were only effective on websites that lacked form validation and were weak security. In this project, however, we implemented a new method that can replicate all types of websites that have form validations and anti-click jackings.