Abstract
Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.
| Original language | English |
|---|---|
| Pages (from-to) | 103-111 |
| Number of pages | 9 |
| Journal | Information Fusion |
| Volume | 53 |
| DOIs | |
| Publication status | Published - 1 Jan 2020 |
Funding
Sophie Eulderink, Sylvie Kok, Dorenda Slof, and Corinca Zwijsen contributed in the early stages of this work. The experiments were run on a Hadoop cluster of the Dutch national e-infrastructure, with the support of SURF Foundation. Damir Vandic is supported by an NWO Mosaic scholarship for project 017.007.142: Semantic Web Enhanced Product Search (SWEPS) .
| Funders |
|---|
| Nederlandse Organisatie voor Wetenschappelijk Onderzoek |
Keywords
- Blocking schemes
- E-commerce
- Entity resolution
- Web shop aggregators