Human, bot or both? A study on the capabilities of classification models on mixed accounts

Nathan W. Cassee, Christos Kitsanelis, Eleni Constantinou, Alexander Serebrenik

Research output: Contribution to conferencePaperAcademic

Abstract

Several bot detection algorithms have recently been discussed in the literature, as software bots that perform maintenance tasks have become more popular in recent years. State-of-the-art techniques detect bots based on a binary classification, where a GitHub account is either a human or a bot. However, this conceptualisation of bot detection as an account-level binary classification problem fails to account for ’mixed accounts’, accounts that are shared between a human and a bot, and that therefore exhibit both bot and human activity. By using binary classification models for bot detection, researchers might hence mischaracterize both human and bot behavior in software maintenance. This calls for conceptualisation of bot detection through a comment-level classification. However, the single such approach solely investigates a small number of mixed account comments. The nature of mixed accounts on GitHub is thus yet unknown, and the absence of appropriate datasets make this a difficult problem to study. In this paper, we investigate three comment-level classification models and we evaluate these classifiers on a manually labeled dataset of mixed accounts. We find that the best classifiers based on these classification models achieve a precision and recall between 88% and 96%. However, even the most accurate comment-level classifier cannot accurately detect mixed accounts; rather, we find that textual content alone, or textual content combined with templates used by bots, are very effective features for the detection of both bot and mixed accounts. Our study calls for more accurate bot detection techniques capable of identifying mixed accounts, and as such supporting more refined insights in software maintenance activities performed by humans and bots on social coding sites.
Original languageEnglish
Pages654-658
Publication statusAccepted/In press - 4 Aug 2021
Event37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021 - Luxembourg , Luxembourg
Duration: 27 Sept 20211 Oct 2021
Conference number: 37

Conference

Conference37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
Abbreviated titleICSME 2021
Country/TerritoryLuxembourg
CityLuxembourg
Period27/09/211/10/21

Fingerprint

Dive into the research topics of 'Human, bot or both? A study on the capabilities of classification models on mixed accounts'. Together they form a unique fingerprint.

Cite this