Towards population reconstruction : extraction of family relationships from historical documents

I. Efremova, Alejandro Montes Garcia, J. Zhang, T.G.K. Calders

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

6 Downloads (Pure)

Abstract

In this paper we present an approach for the automatic extraction of family relationships from a real-world collection of historical notary acts. We retrieve relationships such as husband - wife, parent - child, widow of, etc. We study two ways to deal with this problem. In our first approach, we identify all person names in a document, generate all potential candidate pairs of names and predict whether they are related to each other using classification techniques where the text fragments that occur around and between two names are sued as features. In the second approach, we train and apply a Hidden Markov Model to annotate every word in a document with an appropriate tag indicating if it is a name, a specified relationship descriptor, or neither of these. Then we look for the names connected to each other via relationship descriptors. We discuss the challenges such as processing raw data, obtaining a sufficient amount of training examples, and dealing with an imbalanced and noisy collection. We evaluate our results for each relationship type in terms of precision, recall and f - score.
Original languageEnglish
Title of host publicationFirst International Workshop on Population Informatics for Big Data (21thACM-SIGKDD PopInfo'15), 10-13 August 2015, Sydney, Australia
Publication statusPublished - 2015
Eventconference; 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2015-08-10; 2015-08-13 -
Duration: 10 Aug 201513 Aug 2015

Conference

Conferenceconference; 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2015-08-10; 2015-08-13
Period10/08/1513/08/15
Other21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Fingerprint

Dive into the research topics of 'Towards population reconstruction : extraction of family relationships from historical documents'. Together they form a unique fingerprint.

Cite this