Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection

Bram van Dooremaal, Pavlo Burda, Luca Allodi, Nicola Zannone

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Samenvatting

Phishing attacks arrive in high numbers and often spread quickly, meaning that after-the-fact countermeasures such as domain blacklisting are limited in efficacy. Visual similarity-based approaches have the potential of detecting previously unseen phishing webpages. These approaches, however, require identifying the legitimate webpage(s) they reproduce. Existing approaches rely on textual feature analysis for target identification, with misclassification rates of approximately 1%; however, as most websites a user might visit are legitimate, additional research is needed to further reduce classification errors. In this work, we propose a novel method for target identification that relies on both visual features (extracted from a screenshot of the web page) and textual features (extracted from the DOM of the web page) to identify which website a phishing web page is replicating, and assess its effectiveness in detecting phishing websites using data from phishing aggregators such as OpenPhish, PhishTank and PhishStats. Compared to state-of-the-art text-based classifiers, our method reduces the phishing misclassification rate by 67% (from 1.02% to 0.34%), for an accuracy of 99.66%. This work provides a further step forwards toward semi-automated decision support systems for phishing detection.

Originele taal-2Engels
Titel16th International Conference on Availability, Reliability and Security, ARES 2021
UitgeverijAssociation for Computing Machinery, Inc
ISBN van elektronische versie9781450390514
DOI's
StatusGepubliceerd - 17 aug 2021
Evenement16th International Conference on Availability, Reliability and Security, ARES 2021 - Virtual, Online, Oostenrijk
Duur: 17 aug 202120 aug 2021

Publicatie series

NaamACM International Conference Proceeding Series

Congres

Congres16th International Conference on Availability, Reliability and Security, ARES 2021
Land/RegioOostenrijk
StadVirtual, Online
Periode17/08/2120/08/21

Bibliografische nota

Publisher Copyright:
© 2021 ACM.

Vingerafdruk

Duik in de onderzoeksthema's van 'Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection'. Samen vormen ze een unieke vingerafdruk.

Citeer dit