Multi-source entity resolution for genealogical data

I. Efremova, B. Ranjbar-Sahraei, H. Rahmani, F.A. Oliehoek, T.G.K. Calders, K.P. Tuyls, G. Weiss

    Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureHoofdstukAcademic

    13 Citaten (Scopus)
    1 Downloads (Pure)

    Samenvatting

    In this chapter we study the application of existing entity resolution (ER) techniques on a real-world multi-source genealogical dataset. Our goal is to identify all persons involved in various notary acts and link them to their birth, marriage and death certificates. We analyze the influence of additional ER features such as name popularity, geographical distance and co-reference information on the overall ER performance. We study two prediction models: regression trees and logistic regression. In order to evaluate the performance of the applied algorithms and to obtain a training set for learning the models we developed an interactive interface for getting feedback from human experts. We perform an empirical evaluation on the manually annotated dataset in terms of precision, recall and F-score. We show that using the name popularity, geographical distance together with co-reference information helps to significantly improve ER results.
    Originele taal-2Engels
    TitelPopulation Reconstruction
    RedacteurenG. Bloothooft, P. Christen, K. Mandemakers, M. Schraagen
    Plaats van productieCham
    UitgeverijSpringer
    Pagina's129-154
    ISBN van geprinte versie978-3-319-19883-5
    DOI's
    StatusGepubliceerd - 2015

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Multi-source entity resolution for genealogical data'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit