A coflow-based co-optimization framework for high-performance data analytics

L. Cheng, Y. Wang, Y. Pei, D.H.J. Epema

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

12 Citations (Scopus)

Abstract

Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly important, and also challenging current techniques. Significant performance improvements have been achieved by using state-of-the-art methods, such as reducing network traffic designed in the data management domain, and data flow scheduling in the data communications domain. However, the proposed techniques in both fields just view each other as a black box, and performance gains from a co-optimization perspective have not yet been explored.In this paper, based on current research in coflow scheduling, we propose a novel Coflow-based Co-optimization Framework (CCF), which can co-optimize application-level data movement and network-level data communications for distributed operators, and consequently contribute to their performance in large distributed environments. We present the detailed design and implementation of CCF, and conduct an experimental evaluation of CCF using large-scale simulations on large data joins. Our results demonstrate that CCF can always perform faster than current approaches on network communications in large-scale distributed scenarios.

Original languageEnglish
Title of host publication2017 46th International Conference on Parallel Processing (ICPP), 14-17 August 2017, Bristol, United Kingdom
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages392-401
Number of pages10
ISBN (Electronic)978-1-5386-1042-8
ISBN (Print)978-1-5386-1043-5
DOIs
Publication statusPublished - 1 Sept 2017
Event46th International Conference on Parallel Processing, ICPP 2017 - Bristol, United Kingdom
Duration: 14 Aug 201717 Aug 2017

Conference

Conference46th International Conference on Parallel Processing, ICPP 2017
Country/TerritoryUnited Kingdom
CityBristol
Period14/08/1717/08/17

Keywords

  • Big data
  • Coflow scheduling
  • Data-intensive applications
  • Distributed joins
  • Network communications

Fingerprint

Dive into the research topics of 'A coflow-based co-optimization framework for high-performance data analytics'. Together they form a unique fingerprint.

Cite this