TY - GEN
T1 - Minimizing network traffic for distributed joins using lightweight locality-aware scheduling
AU - Cheng, Long
AU - Murphy, John
AU - Liu, Qingzhi
AU - Hao, Chunliang
AU - Theodoropoulos, Georgios
PY - 2018/8/10
Y1 - 2018/8/10
N2 - Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.
AB - Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85052927546&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-96983-1_21
DO - 10.1007/978-3-319-96983-1_21
M3 - Conference contribution
AN - SCOPUS:85052927546
SN - 978-3-319-96982-4
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 293
EP - 305
BT - Euro-Par 2018
A2 - Torquati, Massimo
A2 - Aldinucci, Marco
A2 - Padovani, Luca
PB - Springer
CY - Cham
T2 - 24th International Conference on Parallel and Distributed Computing, Euro-Par 2018
Y2 - 27 August 2018 through 28 August 2018
ER -