Scalable entity resolution for Web product descriptions

Damir Vandic, Flavius Frasincar (Corresponding author), Uzay Kaymak, Mark Riezebos

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

Original languageEnglish
Pages (from-to)103-111
Number of pages9
JournalInformation Fusion
Volume53
DOIs
Publication statusPublished - 1 Jan 2020

Keywords

  • Blocking schemes
  • E-commerce
  • Entity resolution
  • Web shop aggregators

Fingerprint Dive into the research topics of 'Scalable entity resolution for Web product descriptions'. Together they form a unique fingerprint.

  • Cite this