Abstract
Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.
To better understand the reasons behind this disparity, and offer support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profile picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online profile.
In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.
To better understand the reasons behind this disparity, and offer support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profile picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online profile.
In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.
Original language | English |
---|---|
Title of host publication | Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016, Austin, Texas, USA, May 14-22, 2016 |
Editors | M. Kim, R. Robbes, C. Bird |
Place of Publication | New York |
Publisher | Association for Computing Machinery, Inc |
Pages | 425-429 |
Number of pages | 5 |
ISBN (Print) | 978-1-4503-4186-8 |
DOIs | |
Publication status | Published - 2016 |