Minimizing network traffic for distributed joins using lightweight locality-aware scheduling

Long Cheng, John Murphy, Qingzhi Liu, Chunliang Hao, Georgios Theodoropoulos

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)

Abstract

Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.

Original languageEnglish
Title of host publicationEuro-Par 2018
Subtitle of host publicationParallel Processing - 24th International Conference on Parallel and Distributed Computing, Proceedings
EditorsMassimo Torquati, Marco Aldinucci, Luca Padovani
Place of PublicationCham
PublisherSpringer
Pages293-305
Number of pages13
ISBN (Electronic)978-3-319-96983-1
ISBN (Print)978-3-319-96982-4
DOIs
Publication statusPublished - 10 Aug 2018
Event24th International Conference on Parallel and Distributed Computing, Euro-Par 2018 - Turin, Italy
Duration: 27 Aug 201828 Aug 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11014 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Parallel and Distributed Computing, Euro-Par 2018
Country/TerritoryItaly
CityTurin
Period27/08/1828/08/18

Fingerprint

Dive into the research topics of 'Minimizing network traffic for distributed joins using lightweight locality-aware scheduling'. Together they form a unique fingerprint.

Cite this