Who's who in Gnome : using LSA to merge software repository identities

E.T.M. Kouters, B.N. Vasilescu, A. Serebrenik, M.G.J. Brand, van den

    Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

    63 Citaten (Scopus)
    3 Downloads (Pure)

    Samenvatting

    Understanding an individual’s contribution to an ecosystem often necessitates integrating information from multiple repositories corresponding to different projects within the ecosystem or different kinds of repositories (e.g., mail archives and version control systems). However, recognising that different contributions belong to the same contributor is challenging, since developers may use different aliases. It is known that existing identity merging algorithms are sensitive to large discrepancies between the aliases used by the same individual: the noisier the data, the worse their performance. To assess the scale of the problem for a large software ecosystem, we study all GNOME Git repositories, classify the differences in aliases, and discuss robustness of existing algorithms with respect to these types of differences. We then propose a new identity merging algorithm based on Latent Semantic Analysis (LSA), designed to be robust against more types of differences in aliases, and evaluate it empirically by means of cross-validation on GNOME Git authors. Our results show a clear improvement over existing algorithms in terms of precision and recall on worst-case input data.
    Originele taal-2Engels
    TitelProceedings of the Early Research Achievements (ERA) track of the 28th IEEE International Conference on Software Maintenance (ICSM 2012, Trento, Italy, September 23-30, 2012)
    Plaats van productiePiscataway
    UitgeverijInstitute of Electrical and Electronics Engineers
    Pagina's592-595
    ISBN van geprinte versie978-1-4673-2312-3
    DOI's
    StatusGepubliceerd - 2012
    Evenementconference; 28th IEEE International Conference on Software Maintenance; 2012-09-23; 2012-09-30 -
    Duur: 23 sep. 201230 sep. 2012

    Congres

    Congresconference; 28th IEEE International Conference on Software Maintenance; 2012-09-23; 2012-09-30
    Periode23/09/1230/09/12
    Ander28th IEEE International Conference on Software Maintenance

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Who's who in Gnome : using LSA to merge software repository identities'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit