Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades. Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem. We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping. Hence, Tupelo is applicable in a wide range of data mapping scenarios. Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective.
|Name||Lecture Notes in Computer Science|