Achieving performance balance among spark frameworks with two-level schedulers

Aleksandra Kuzmanovska, Hans van den Bogert, Rudolf Mak, Dick Epema

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

Uittreksel

When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.

TaalEngels
TitelProceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
Plaats van productiePiscataway
UitgeverijInstitute of Electrical and Electronics Engineers
Pagina's133-142
Aantal pagina's10
ISBN van elektronische versie9781538658154
DOI's
StatusGepubliceerd - 13 jul 2018
Evenement18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018 - Washington, Verenigde Staten van Amerika
Duur: 1 mei 20184 mei 2018

Congres

Congres18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
LandVerenigde Staten van Amerika
StadWashington
Periode1/05/184/05/18

Vingerafdruk

Electric sparks
Resource allocation
Feedback
Defects
Scheduling
Controllers
Experiments

Trefwoorden

    Citeer dit

    Kuzmanovska, A., van den Bogert, H., Mak, R., & Epema, D. (2018). Achieving performance balance among spark frameworks with two-level schedulers. In Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018 (blz. 133-142). Piscataway: Institute of Electrical and Electronics Engineers. DOI: 10.1109/CCGRID.2018.00028
    Kuzmanovska, Aleksandra ; van den Bogert, Hans ; Mak, Rudolf ; Epema, Dick. / Achieving performance balance among spark frameworks with two-level schedulers. Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Piscataway : Institute of Electrical and Electronics Engineers, 2018. blz. 133-142
    @inproceedings{a70eacf634684459bfeaf4beb8154fae,
    title = "Achieving performance balance among spark frameworks with two-level schedulers",
    abstract = "When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.",
    keywords = "Data processing framework, DRF, Job slowdown, Koala F, Mesos, Performance balance, Resource allocation policy, Spark, Two level schedulers",
    author = "Aleksandra Kuzmanovska and {van den Bogert}, Hans and Rudolf Mak and Dick Epema",
    year = "2018",
    month = "7",
    day = "13",
    doi = "10.1109/CCGRID.2018.00028",
    language = "English",
    pages = "133--142",
    booktitle = "Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018",
    publisher = "Institute of Electrical and Electronics Engineers",
    address = "United States",

    }

    Kuzmanovska, A, van den Bogert, H, Mak, R & Epema, D 2018, Achieving performance balance among spark frameworks with two-level schedulers. in Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Institute of Electrical and Electronics Engineers, Piscataway, blz. 133-142, Washington, Verenigde Staten van Amerika, 1/05/18. DOI: 10.1109/CCGRID.2018.00028

    Achieving performance balance among spark frameworks with two-level schedulers. / Kuzmanovska, Aleksandra; van den Bogert, Hans; Mak, Rudolf; Epema, Dick.

    Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Piscataway : Institute of Electrical and Electronics Engineers, 2018. blz. 133-142.

    Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

    TY - GEN

    T1 - Achieving performance balance among spark frameworks with two-level schedulers

    AU - Kuzmanovska,Aleksandra

    AU - van den Bogert,Hans

    AU - Mak,Rudolf

    AU - Epema,Dick

    PY - 2018/7/13

    Y1 - 2018/7/13

    N2 - When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.

    AB - When multiple data-processing frameworks with time-varying workloads are simultaneously present in a single cluster or data-center, an apparent goal is to have them experience equal performance, expressed in whatever performance metrics are applicable. In modern data-center environments, Two-Level Schedulers (TLSs) that leave the scheduling of individual jobs to the schedulers within the data-processing frameworks are typically used for managing the resources of data-processing frameworks. Two such TLSs with opposite designs are Mesos and Koala-F. Mesos employs fine-grained resource allocation and aims at Dominant Resource Fairness (DRF) among framework instances by offering resources to them for the duration of a single task. In contrast, Koala-F aims at performance fairness among framework instances by employing dynamic coarse-grained resource allocation of sets of complete nodes based on performance feedback from individual instances. The goal of this paper is to explore the trade-offs between these two TLS designs when trying to achieve performance balance among frameworks. We select Apache Spark as a representative of data-processing frameworks, and perform experiments on a modest-sized cluster, using jobs chosen from commonly used data-processing benchmarks. Our results reveal that achieving performance balance among framework instances is a challenge for both TLS designs, despite their opposite design choices. Moreover, we exhibit design flaws in the DRF allocation policy that prevent Mesos from achieving performance balance. Finally, to remedy these flaws, we propose a feedback controller for Mesos that dynamically adapts framework weights, as used in Weighted DRF (W-DRF), based on their performance.

    KW - Data processing framework

    KW - DRF

    KW - Job slowdown

    KW - Koala F

    KW - Mesos

    KW - Performance balance

    KW - Resource allocation policy

    KW - Spark

    KW - Two level schedulers

    UR - http://www.scopus.com/inward/record.url?scp=85050976605&partnerID=8YFLogxK

    U2 - 10.1109/CCGRID.2018.00028

    DO - 10.1109/CCGRID.2018.00028

    M3 - Conference contribution

    SP - 133

    EP - 142

    BT - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018

    PB - Institute of Electrical and Electronics Engineers

    CY - Piscataway

    ER -

    Kuzmanovska A, van den Bogert H, Mak R, Epema D. Achieving performance balance among spark frameworks with two-level schedulers. In Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Piscataway: Institute of Electrical and Electronics Engineers. 2018. blz. 133-142. Beschikbaar vanaf, DOI: 10.1109/CCGRID.2018.00028