TY - JOUR

T1 - A probabilistic perspective on re-identifiability

AU - Koot, M.R.

AU - Mandjes, M.R.H.

AU - Noordende, van 't, G.

AU - Laat, de, C.T.A.M.

PY - 2013

Y1 - 2013

N2 - A quasi-identifier is a set of attributes that can be used to re-identify entries in anonymized data sets. A group of individuals is considered about whom quasi-identifying numerical information is disclosed such as date of birth, age, weight, and height. The fraction of individuals is determined whose information is unique in that group and hence is identifiable unambiguously. Nonuniformity can be captured well by a single number, the Kullback-Leibler distance. For example sets of real microdata, given approximations based on Kullback-Leibler distances are accurate. Second, the effect of disclosing more specific or less specific information is analyzed experimentally. Third, the effect of correlation between numerical attributes is measured. A formula gives the re-identifiability level. The approximations are validated using publicly available demographic data sets. © 2013 Copyright Taylor and Francis Group, LLC.
Keywords: data anonymity; demographic data; Kullback-Leibler distance; privacy; probability theory; security

AB - A quasi-identifier is a set of attributes that can be used to re-identify entries in anonymized data sets. A group of individuals is considered about whom quasi-identifying numerical information is disclosed such as date of birth, age, weight, and height. The fraction of individuals is determined whose information is unique in that group and hence is identifiable unambiguously. Nonuniformity can be captured well by a single number, the Kullback-Leibler distance. For example sets of real microdata, given approximations based on Kullback-Leibler distances are accurate. Second, the effect of disclosing more specific or less specific information is analyzed experimentally. Third, the effect of correlation between numerical attributes is measured. A formula gives the re-identifiability level. The approximations are validated using publicly available demographic data sets. © 2013 Copyright Taylor and Francis Group, LLC.
Keywords: data anonymity; demographic data; Kullback-Leibler distance; privacy; probability theory; security

U2 - 10.1080/08898480.2013.816222

DO - 10.1080/08898480.2013.816222

M3 - Article

VL - 20

SP - 155

EP - 171

JO - Mathematical Population Studies

JF - Mathematical Population Studies

SN - 0889-8480

IS - 3

ER -