The ABC of data: A classifying framework for data readiness

Laurens A. Castelijns, Yuri Maas, Joaquin Vanschoren

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

9 Citations (Scopus)
208 Downloads (Pure)

Abstract

In order to (semi)automate data cleaning and preprocessing, we need a clear and measurable definition of data quality. Data readiness levels have been proposed to fit this need, but they require a more detailed and measurable definition than is given in prior works. We present a practical framework focused on machine learning that encapsulates data cleaning and (pre)processing procedures. In our framework, datasets are classified within bands, and each band introduces more fine-grained terminology and processing steps. Scores are assigned to each step, resulting in a data quality score. This allows teams of people, as well as automated processes, to track and reason about the cleaning process, and communicate the current status and deficiencies in a more structured, well-documented manner.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - International Workshops of ECML PKDD 2019, Proceedings
EditorsPeggy Cellier, Kurt Driessens
PublisherSpringer
Pages3-16
Number of pages14
ISBN (Print)9783030438227
DOIs
Publication statusPublished - 2020
Event2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Germany
Duration: 16 Sept 201920 Sept 2019
Conference number: 19
http://ecmlpkdd2019.org/

Publication series

NameCommunications in Computer and Information Science
Volume1167 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019)
Abbreviated titleECML PKDD 2019
Country/TerritoryGermany
CityWurzburg
Period16/09/1920/09/19
Internet address

Keywords

  • Automated data science
  • Data cleaning
  • Data quality
  • Data readiness levels
  • Preprocessing

Fingerprint

Dive into the research topics of 'The ABC of data: A classifying framework for data readiness'. Together they form a unique fingerprint.

Cite this