Scalable entity resolution for Web product descriptions

Damir Vandic, Flavius Frasincar (Corresponding author), Uzay Kaymak, Mark Riezebos

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

LanguageEnglish
Pages103-111
Number of pages9
JournalInformation Fusion
Volume53
DOIs
StatePublished - 1 Jan 2020

Fingerprint

Consumer electronics
Sales
Experiments

Keywords

  • Blocking schemes
  • E-commerce
  • Entity resolution
  • Web shop aggregators

Cite this

Vandic, Damir ; Frasincar, Flavius ; Kaymak, Uzay ; Riezebos, Mark. / Scalable entity resolution for Web product descriptions. In: Information Fusion. 2020 ; Vol. 53. pp. 103-111
@article{5ef62da404be4167b5e0b5b0479fba45,
title = "Scalable entity resolution for Web product descriptions",
abstract = "Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.",
keywords = "Blocking schemes, E-commerce, Entity resolution, Web shop aggregators",
author = "Damir Vandic and Flavius Frasincar and Uzay Kaymak and Mark Riezebos",
year = "2020",
month = "1",
day = "1",
doi = "10.1016/j.inffus.2019.06.002",
language = "English",
volume = "53",
pages = "103--111",
journal = "Information Fusion",
issn = "1566-2535",
publisher = "Elsevier",

}

Vandic, D, Frasincar, F, Kaymak, U & Riezebos, M 2020, 'Scalable entity resolution for Web product descriptions' Information Fusion, vol. 53, pp. 103-111. DOI: 10.1016/j.inffus.2019.06.002

Scalable entity resolution for Web product descriptions. / Vandic, Damir; Frasincar, Flavius (Corresponding author); Kaymak, Uzay; Riezebos, Mark.

In: Information Fusion, Vol. 53, 01.01.2020, p. 103-111.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Scalable entity resolution for Web product descriptions

AU - Vandic,Damir

AU - Frasincar,Flavius

AU - Kaymak,Uzay

AU - Riezebos,Mark

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

AB - Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

KW - Blocking schemes

KW - E-commerce

KW - Entity resolution

KW - Web shop aggregators

UR - http://www.scopus.com/inward/record.url?scp=85067202468&partnerID=8YFLogxK

U2 - 10.1016/j.inffus.2019.06.002

DO - 10.1016/j.inffus.2019.06.002

M3 - Article

VL - 53

SP - 103

EP - 111

JO - Information Fusion

T2 - Information Fusion

JF - Information Fusion

SN - 1566-2535

ER -

Vandic D, Frasincar F, Kaymak U, Riezebos M. Scalable entity resolution for Web product descriptions. Information Fusion. 2020 Jan 1;53:103-111. Available from, DOI: 10.1016/j.inffus.2019.06.002