Abstract
Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigation on how the proposed prediction methods perform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.
Original language | English |
---|---|
Title of host publication | Proceedings of the 18th International Symposium on High Performance Distributed Computing (HPDC'09, Munich, Germany, June 11-13, 2009) |
Place of Publication | New York NY |
Publisher | Association for Computing Machinery, Inc |
Pages | 111-120 |
ISBN (Print) | 978-1-60558-587-1 |
DOIs | |
Publication status | Published - 2009 |