Lean GHTorrent : GitHub data on demand

G. Gousios, B.N. Vasilescu, A. Serebrenik, A. Zaidman

    Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

    106 Citaten (Scopus)

    Samenvatting

    In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GitHub as their host and have migrated their code base to it. GitHub offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GitHub data is, to date, largely underexplored. To facilitate studies of GitHub, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GitHub REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GitHub repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GitHub data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GitHub studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).
    Originele taal-2Engels
    Titel11th Working Conference on Mining Software Repositories (MSR '14, Hyderabad, India, May 31-June 1, 2014)
    Plaats van productieNew York NY
    UitgeverijAssociation for Computing Machinery, Inc
    Pagina's384-387
    ISBN van geprinte versie978-1-4503-2863-0
    DOI's
    StatusGepubliceerd - 2014
    Evenement11th Working Conference on Mining Software Repositories (MSR 2014) - Hyderabad International Convention Centre (HICC), Hyderabad, India
    Duur: 31 mei 20141 jun. 2014
    Congresnummer: 11
    http://2014.msrconf.org/

    Congres

    Congres11th Working Conference on Mining Software Repositories (MSR 2014)
    Verkorte titelMSR 2014
    Land/RegioIndia
    StadHyderabad
    Periode31/05/141/06/14
    Ander11th Working Conference on Mining Software Repositories
    Internet adres

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Lean GHTorrent : GitHub data on demand'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit