Combating phishing attacks via brand identity and authorization features



Phishing, also called brand spoofing, has become the most troubling scam on the Internet, which seriously threatens the Web security. The essence of phish is that “robbers” use false sites, which look like a trustworthy brand site, where favicon, logo and copyright notice are important brand identities. We analyzed 78-day phishing data of PhishTank and Anti-Phishing Working Group (APWG). The statistics show that more than 98.93% phishing sites contain at least one brand entity—favicon, logo or copyright notice. Indeed, only a few lowest-quality phishing campaigns do not use such brand elements. Obviously, brand entities are powerful weapons of phishers to trick users. By analyzing the characteristics of brand entities in phishing sites, several brand identity features are extracted. However, only brand entities do not consider whether the Web page with brand entities belongs to the corresponding brand or has an authorization to use the brand entities. To solve this problem, redirection, incoming links and Domain Name System (DNS) information-based brand authorization features are further extracted to discriminate the sites with branding rights from phishing sites. Based on extracted features, statistical anti-phishing classification models are trained. We collected a diverse spectrum of corpora containing 3863 phishing cases from PhishTank and APWG, and 17 571 legitimate samples from DMOZ, Google and DNS resolution log. Experimental evaluations show that the model achieves 98.8% true positive rate and 0.09% false positive rate, which demonstrates the competitive performances of extracted features for statistical anti-phishing in practice. Copyright © 2014 John Wiley & Sons, Ltd.