Adaptive Distributed Streaming Similarity Joins

George Siachamis, Kyriakos Psarakis, Marios Fragkoulis, Odysseas Papapetrou, Arie Van Deursen, Asterios Katsifodimos

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)

Abstract

How can we perform similarity joins of multi-dimensional streams in a distributed fashion, achieving low latency? Can we adaptively repartition those streams in order to retain high performance under concept drifts? Current approaches to similarity joins are either restricted to single-node deployments or focus on set-similarity joins, failing to cover the ubiquitous case of metric-space similarity joins. In this paper, we propose the first adaptive distributed streaming similarity join approach that gracefully scales with variable velocity and distribution of multi-dimensional data streams. Our approach can adaptively rebalance the load of nodes in the case of concept drifts, allowing for similarity computations in the general metric space. We implement our approach on top of Apache Flink and evaluate its data partitioning and load balancing schemes on a set of synthetic datasets in terms of latency, comparisons ratio, and data duplication ratio.

Original languageEnglish
Title of host publicationDEBS 2023 - Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems
PublisherAssociation for Computing Machinery, Inc
Pages25-36
Number of pages12
ISBN (Electronic)9798400701221
DOIs
Publication statusPublished - 27 Jun 2023
Event17th ACM International Conference on Distributed and Event-based Systems, DEBS 2023 - Neuchatel, Switzerland
Duration: 27 Jun 202330 Jun 2023

Conference

Conference17th ACM International Conference on Distributed and Event-based Systems, DEBS 2023
Country/TerritorySwitzerland
CityNeuchatel
Period27/06/2330/06/23

Bibliographical note

Publisher Copyright:
© 2023 Owner/Author(s).

Funding

This publication is part of the project number 19708, of the Vidi research programme partly financed by the Dutch Research Council (NWO). It is also partially funded by European Union’s Horizon Europe research and innovation programme, under grant agreement No. 101070122, and by ICAI AI for Fintech Research Lab.

FundersFunder number
European Union's Horizon 2020 - Research and Innovation Framework Programme101070122
Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Keywords

    • data partitioning
    • data streams
    • distributed computations
    • load balancing
    • similarity joins

    Fingerprint

    Dive into the research topics of 'Adaptive Distributed Streaming Similarity Joins'. Together they form a unique fingerprint.

    Cite this