The problem of protecting datasets from the disclosure of confidential information, while published data remains useful for analysis, has recently gained momentum. To solve this problem, anonymization techniques such as k-anonymity, ℓ -diversity, and t-closeness have been used to generate anonymized datasets for training classifiers. While these techniques provide an effective means to generate anonymized datasets, an understanding of how their application affects the performance of classifiers is currently missing. This knowledge enables the data owner and analyst to select the most appropriate classification algorithm and training parameters in order to guarantee high privacy requirements while minimizing the loss of accuracy. In this study, we perform extensive experiments to verify how the classifiers performance changes when trained on an anonymized dataset compared to the original one, and evaluate the impact of classification algorithms, datasets properties, and anonymization parameters on classifiers’ performance.
|Title of host publication||Data and Applications Security and Privacy XXXV - 35th Annual IFIP WG 11.3 Conference, DBSec 2021, Proceedings|
|Editors||Ken Barker, Kambiz Ghazinour|
|Publisher||Springer Science and Business Media B.V.|
|Number of pages||22|
|Publication status||Published - 2021|
|Event||35th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy, DBSec 2021 - Virtual, Online|
Duration: 19 Jul 2021 → 20 Jul 2021
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Conference||35th Annual IFIP WG 11.3 Conference on Data and Applications Security and Privacy, DBSec 2021|
|Period||19/07/21 → 20/07/21|
Bibliographical notePublisher Copyright:
© 2021, IFIP International Federation for Information Processing.
- Classifiers comparison
- ℓ -diversity