Scalable entity resolution for Web product descriptions

Damir Vandic, Flavius Frasincar (Corresponding author), Uzay Kaymak, Mark Riezebos

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

Uittreksel

Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

TaalEngels
Pagina's103-111
Aantal pagina's9
TijdschriftInformation Fusion
Volume53
DOI's
StatusGepubliceerd - 1 jan 2020

Vingerafdruk

Consumer electronics
Sales
Experiments

Trefwoorden

    Citeer dit

    Vandic, Damir ; Frasincar, Flavius ; Kaymak, Uzay ; Riezebos, Mark. / Scalable entity resolution for Web product descriptions. In: Information Fusion. 2020 ; Vol. 53. blz. 103-111
    @article{5ef62da404be4167b5e0b5b0479fba45,
    title = "Scalable entity resolution for Web product descriptions",
    abstract = "Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.",
    keywords = "Blocking schemes, E-commerce, Entity resolution, Web shop aggregators",
    author = "Damir Vandic and Flavius Frasincar and Uzay Kaymak and Mark Riezebos",
    year = "2020",
    month = "1",
    day = "1",
    doi = "10.1016/j.inffus.2019.06.002",
    language = "English",
    volume = "53",
    pages = "103--111",
    journal = "Information Fusion",
    issn = "1566-2535",
    publisher = "Elsevier",

    }

    Vandic, D, Frasincar, F, Kaymak, U & Riezebos, M 2020, 'Scalable entity resolution for Web product descriptions' Information Fusion, vol. 53, blz. 103-111. DOI: 10.1016/j.inffus.2019.06.002

    Scalable entity resolution for Web product descriptions. / Vandic, Damir; Frasincar, Flavius (Corresponding author); Kaymak, Uzay; Riezebos, Mark.

    In: Information Fusion, Vol. 53, 01.01.2020, blz. 103-111.

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    TY - JOUR

    T1 - Scalable entity resolution for Web product descriptions

    AU - Vandic,Damir

    AU - Frasincar,Flavius

    AU - Kaymak,Uzay

    AU - Riezebos,Mark

    PY - 2020/1/1

    Y1 - 2020/1/1

    N2 - Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

    AB - Consumers are increasingly using the Web to find product information and make online purchases. This is reflected by the ongoing growth of worldwide e-commerce sales figures. Entity resolution is an important task that supports many services that have arisen from this growth, such as Web shop aggregators. In this paper, we propose a scalable framework for multi-source entity resolution. Our blocking approach employs model words to produce blocks that make our solution highly effective and efficient for the considered domains. An in-depth evaluation, performed using millions of experiments and three large datasets (on consumer electronics and software products), shows that our model words-based approach outperforms other approaches in most cases. Furthermore, we also evaluate our approach with an imperfect similarity function and find that model words-based blocking schemes provide the best blocks with respect to the F1-measure.

    KW - Blocking schemes

    KW - E-commerce

    KW - Entity resolution

    KW - Web shop aggregators

    UR - http://www.scopus.com/inward/record.url?scp=85067202468&partnerID=8YFLogxK

    U2 - 10.1016/j.inffus.2019.06.002

    DO - 10.1016/j.inffus.2019.06.002

    M3 - Article

    VL - 53

    SP - 103

    EP - 111

    JO - Information Fusion

    T2 - Information Fusion

    JF - Information Fusion

    SN - 1566-2535

    ER -

    Vandic D, Frasincar F, Kaymak U, Riezebos M. Scalable entity resolution for Web product descriptions. Information Fusion. 2020 jan 1;53:103-111. Beschikbaar vanaf, DOI: 10.1016/j.inffus.2019.06.002