CARONTE: crawling adversarial resources over non-trusted, high-profile environments

Michele Campobasso, Pavlo Burda, Luca Allodi

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This paper presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase the tool against four underground forums, and compare the network traffic it generates (as seen from the adversary's position, i.e. the underground community’s server) against state-of-the-art tools for web-crawling as well as human users.
Original languageEnglish
Title of host publicationProceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages433-442
Number of pages10
ISBN (Electronic)9781728130262
DOIs
Publication statusPublished - 1 Jun 2019
Event4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019 - Stockholm, Sweden
Duration: 17 Jun 201919 Jun 2019

Conference

Conference4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019
CountrySweden
CityStockholm
Period17/06/1919/06/19

Fingerprint

Scalability
Servers
Resources
Data collection
Monitoring
Costs
World Wide Web
Entry costs
Data extraction
Train

Keywords

  • Data collection
  • High-profile communities
  • Stealth monitoring
  • Underground

Cite this

Campobasso, M., Burda, P., & Allodi, L. (2019). CARONTE: crawling adversarial resources over non-trusted, high-profile environments. In Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019 (pp. 433-442). [8802484] Piscataway: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/EuroSPW.2019.00055
Campobasso, Michele ; Burda, Pavlo ; Allodi, Luca. / CARONTE : crawling adversarial resources over non-trusted, high-profile environments. Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019. Piscataway : Institute of Electrical and Electronics Engineers, 2019. pp. 433-442
@inproceedings{eb4a02af985944909943ac6924043794,
title = "CARONTE: crawling adversarial resources over non-trusted, high-profile environments",
abstract = "The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This paper presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase the tool against four underground forums, and compare the network traffic it generates (as seen from the adversary's position, i.e. the underground community’s server) against state-of-the-art tools for web-crawling as well as human users.",
keywords = "Data collection, High-profile communities, Stealth monitoring, Underground",
author = "Michele Campobasso and Pavlo Burda and Luca Allodi",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/EuroSPW.2019.00055",
language = "English",
pages = "433--442",
booktitle = "Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019",
publisher = "Institute of Electrical and Electronics Engineers",
address = "United States",

}

Campobasso, M, Burda, P & Allodi, L 2019, CARONTE: crawling adversarial resources over non-trusted, high-profile environments. in Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019., 8802484, Institute of Electrical and Electronics Engineers, Piscataway, pp. 433-442, 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019, Stockholm, Sweden, 17/06/19. https://doi.org/10.1109/EuroSPW.2019.00055

CARONTE : crawling adversarial resources over non-trusted, high-profile environments. / Campobasso, Michele; Burda, Pavlo; Allodi, Luca.

Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019. Piscataway : Institute of Electrical and Electronics Engineers, 2019. p. 433-442 8802484.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - CARONTE

T2 - crawling adversarial resources over non-trusted, high-profile environments

AU - Campobasso, Michele

AU - Burda, Pavlo

AU - Allodi, Luca

PY - 2019/6/1

Y1 - 2019/6/1

N2 - The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This paper presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase the tool against four underground forums, and compare the network traffic it generates (as seen from the adversary's position, i.e. the underground community’s server) against state-of-the-art tools for web-crawling as well as human users.

AB - The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This paper presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase the tool against four underground forums, and compare the network traffic it generates (as seen from the adversary's position, i.e. the underground community’s server) against state-of-the-art tools for web-crawling as well as human users.

KW - Data collection

KW - High-profile communities

KW - Stealth monitoring

KW - Underground

UR - http://www.scopus.com/inward/record.url?scp=85071932877&partnerID=8YFLogxK

U2 - 10.1109/EuroSPW.2019.00055

DO - 10.1109/EuroSPW.2019.00055

M3 - Conference contribution

AN - SCOPUS:85071932877

SP - 433

EP - 442

BT - Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019

PB - Institute of Electrical and Electronics Engineers

CY - Piscataway

ER -

Campobasso M, Burda P, Allodi L. CARONTE: crawling adversarial resources over non-trusted, high-profile environments. In Proceedings - 4th IEEE European Symposium on Security and Privacy Workshops, EUROS and PW 2019. Piscataway: Institute of Electrical and Electronics Engineers. 2019. p. 433-442. 8802484 https://doi.org/10.1109/EuroSPW.2019.00055