Workload characterization, modeling, and prediction in grid Computing

Leiden Repository

Workload characterization, modeling, and prediction in grid Computing

Type: Doctoral Thesis
Title: Workload characterization, modeling, and prediction in grid Computing
Author: Li, Hui
Publisher: LIACS, Computer Systems Group, Faculty of Science, Leiden University
Issue Date: 2008-01-24
Keywords: Grid computing
Performance prediction
Workload modeling
Workload characterization
Abstract: Workloads play an important role in experimental performance studies of computer systems. This thesis presents a comprehensive characterization of real workloads on production clusters and Grids. A variety of correlation structures and rich scaling behavior are identified in workload attributes such as job arrivals and run times, including pseudo-periodicity, long range dependence, and strong temporal locality. Based on the analytic results workload models are developed to fit the real data. For job arrivals three different kinds of autocorrelations are investigated. For short to middle range dependent data, Markov modulated Poisson processes (MMPP) are good models because they can capture correlations between interarrival times while remaining analytically tractable. For long range dependent and multifractal processes, the multifractal wavelet model (MWM) is able to reconstruct the scaling behavior and it provides a coherent wavelet framework for analysis and synthesis. Pseudo-periodicity is a special kind of autocorrelation and it can be modeled by a matching pursuit approach. For workload attributes such as run time a new model is proposed that can fit not only the marginal distribution but also the second order statistics such as the autocorrelation function (ACF). The development of workload models enable the simulation studies of Grid scheduling strategies. By using the synthetic traces, the performance impacts of workload correlations in Grid scheduling is quantitatively evaluated. The results indicate that autocorrelations in workload attributes can cause performance degradation, in some situations the difference can be up to several orders of magnitude. The larger the autocorrelation, the worse the performance, it is proved both at the cluster and Grid level. This study shows the importance of realistic workload models in performance evaluation studies. Regarding performance predictions, this thesis treats the targeted resources as a ``black box'' and takes a statistical approach. It is shown that statistical learning based methods, after a well-thought and fine-tuned design, are able to deliver good accuracy and performance.
Description: Promotor: H.A.G. Wijshoff, Co-promotor: A.A. Wolters
With summary in Dutch
Faculty: Faculteit der Wiskunde en Natuurwetenschappen
Citation: Li, H., 2008, Doctoral thesis, Leiden University
Series/Report no.: ASCI dissertation series number ; 159
ISBN: 9789090226743
Handle: http://hdl.handle.net/1887/12574
 

Files in this item

Description Size View
application/pdf Full text 4.678Mb View/Open

This item appears in the following Collection(s)