×

Heuristics for periodical batch job scheduling in a MapReduce computing framework. (English) Zbl 1390.68137

Summary: Task scheduling has a significant impact on the performance of the MapReduce computing framework. In this paper, a scheduling problem of periodical batch jobs with makespan minimization is considered. The problem is modeled as a general two-stage hybrid flow shop scheduling problem with schedule-dependent setup times. The new model incorporates the data locality of tasks and is formulated as an integer program. Three heuristics are developed to solve the problem and an improvement policy based on data locality is presented to enhance the methods. A lower bound of the makespan is derived. 150 instances are randomly generated from data distributions drawn from a real cluster. The parameters involved in the methods are set according to different cluster setups. The proposed heuristics are compared over different numbers of jobs and cluster setups. Computational results show that the performance of the methods is highly dependent on both the number of jobs and the cluster setups. The proposed improvement policy is effective and the impact of the input data distribution on the policy is analyzed and tested.

MSC:

68M20 Performance evaluation, queueing, and scheduling in the context of computer systems
90C59 Approximation methods and heuristics in mathematical programming

Software:

SPOT; Hadoop; MapReduce

References:

[1] Asahara, M.; Nakadai, S.; Araki, T., LoadAtomizer: a locality and I/O load aware task scheduler for MapReduce, Proceedings of the IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), 317-324 (2012), IEEE
[2] (Bartz-Beielstein, T.; Chiarandini, M.; Paquete, L.; Preuss, M. (2010), Springer: Springer New York)
[3] Berlińska, J.; Drozdowski, M., Scheduling divisible MapReduce computations, J. Parallel Distrib. Comput., 71, 3, 450-459 (2011)
[4] Brucker, P., Scheduling Algorithms, vol. 3 (2007), Springer · Zbl 1126.90001
[5] Chang, H.; Kodialam, M.; Kompella, R. R.; Lakshman, T.; Lee, M.; Mukherjee, S., Scheduling in MapReduce-like systems for fast completion time, Proceedings of Annual Joint Conference of the IEEE Computer and Communications, INFOCOM, 3074-3082 (2011), IEEE
[6] Chen, Q.; Zhang, D.; Guo, M.; Deng, Q.; Guo, S., SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment, Proceedings of the IEEE 10th International Conference on Computer and Information Technology (CIT), 2736-2743 (2010), IEEE
[7] Chen, Y.; Ganapathi, A.; Griffith, R.; Katz, R., The case for evaluating MapReduce performance using workload suites, Proceedings of the IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 390-399 (2011), IEEE
[8] Czyżewski, A.; Bratoszewski, P.; Ciarkowski, A.; Cichowski, J.; Lisowski, K.; Szczodrak, M.; Szwoch, G.; Krawczyk, H., Massive surveillance data processing with supercomputing cluster, Inf. Sci., 296, 322-344 (2015)
[9] Dean, J.; Ghemawat, S., MapReduce: simplified data processing on large clusters, Commun. ACM, 51, 1, 107-113 (2008)
[10] Fischer, M. J.; Su, X.; Yin, Y., Assigning tasks for efficiency in Hadoop, Proceedings of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, 30-39 (2010), ACM
[11] Gupta, J. N.D., Two-stage, hybrid flowshop scheduling problem, J. Oper. Res. Soc., 39, 4, 359-364 (1988) · Zbl 0639.90049
[12] Haouari, M.; M’Hallah, R., Heuristic algorithms for the two-stage hybrid flowshop problem, Oper. Res. Lett., 21, 1, 43-53 (1997) · Zbl 0885.90055
[13] Huang, W.; Li, S., A two-stage hybrid flowshop with uniform machines and setup times, Math. Comput. Model., 27, 2, 27-45 (1998) · Zbl 1185.90078
[14] Johnson, S. M., Optimal two-and three-stage production schedules with setup times included, Nav. Res. Logist. Q., 1, 1, 61-68 (1954) · Zbl 1349.90359
[15] Jungwattanakit, J.; Reodecha, M.; Chaovalitwongse, P.; Werner, F., Algorithms for flexible flow shop problems with unrelated parallel machines, setup times, and dual criteria, Int. J. Adv. Manuf. Technol., 37, 3-4, 354-370 (2008)
[16] Kavulya, S.; Tan, J.; Gandhi, R.; Narasimhan, P., An analysis of traces from a production MapReduce cluster, Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid), 94-103 (2010), IEEE
[17] Kurz, M. E.; Askin, R. G., Scheduling flexible flow lines with sequence-dependent setup times, Eur. J. Oper. Res., 159, 1, 66-82 (2004) · Zbl 1067.90045
[18] Lee, C.-Y.; Vairaktarakis, G. L., Minimizing makespan in hybrid flowshops, Oper. Res. Lett., 16, 3, 149-158 (1994) · Zbl 0812.90066
[19] Lu, P.; Lee, Y. C.; Wang, C.; Zhou, B. B.; Chen, J.; Zomaya, A. Y., Workload characteristic oriented scheduler for MapReduce, Proceedings of the 2012 IEEE 18th International Conference on Parallel and Distributed Systems, 156-163 (2012), IEEE Computer Society
[20] Mika, M.; Waligóra, G.; Wȩglarz, J., Tabu search for multi-mode resource-constrained project scheduling with schedule-dependent setup times, Eur. J. Oper. Res., 187, 3, 1238-1250 (2008) · Zbl 1137.90508
[21] Moseley, B.; Dasgupta, A.; Kumar, R.; Sarlós, T., On scheduling in Map-Reduce and flow-shops, Proceedings of the 23rd ACM Symposium on Parallelism in Algorithms and Architectures, 289-298 (2011), ACM
[22] Oğuz, C.; Fikret Ercan, M.; Edwin Cheng, T. C.; Fung, Y.-F., Heuristic algorithms for multiprocessor task scheduling in a two-stage hybrid flow-shop, Eur. J. Oper. Res., 149, 2, 390-403 (2003) · Zbl 1059.90072
[23] Phan, L. T.; Zhang, Z.; Loo, B. T.; Lee, I., Real-time MapReduce Scheduling, Technical Report (2010), UPenn
[24] Philip Chen, C.; Zhang, C.-Y., Data-intensive applications, challenges, techniques and technologies: a survey on big data, Inf. Sci., 275, 314-347 (2014)
[25] Pinedo, M., Scheduling: theory, algorithms, and systems (2012), Springer · Zbl 1239.90002
[26] Polo, J.; Carrera, D.; Becerra, Y.; Torres, J.; Ayguadé, E.; Steinder, M.; Whalley, I., Performance-driven task co-scheduling for MapReduce environments, Proceedings of the IEEE Network Operations and Management Symposium (NOMS), 373-380 (2010), IEEE
[27] Radenski, A.; Ehwerhemuepha, L., Speeding-up codon analysis on the cloud with local MapReduce aggregation, Inf. Sci., 263, 175-185 (2014)
[28] Rasch, D.; Guiard, V., The robustness of parametric statistical methods, Psychol. Sci., 46, 2, 175-208 (2004)
[29] Riane, F.; Artiba, A.; Elmaghraby, S. E., A hybrid three-stage flowshop problem: efficient heuristics to minimize makespan, Eur. J. Oper. Res., 109, 2, 321-329 (1998) · Zbl 0937.90035
[30] Ribas, I.; Leisten, R.; Framinan, J. M., Review and classification of hybrid flow shop scheduling problems from a production system and a solutions procedure perspective, Comput. Oper. Res., 37, 8, 1439-1454 (2010) · Zbl 1183.90189
[31] Ridge, E.; Kudenko, D., Tuning an algorithm using design of experiments, (Bartz-Beielstein, T.; Chiarandini, M.; Paquete, L.; Preuss, M., Experimental Methods for the Analysis of Optimization Algorithms (2010), Springer: Springer New York), 265-286 · Zbl 1206.68380
[32] Ruiz, R.; Vázquez-Rodríguez, J. A., The hybrid flow shop scheduling problem, Eur. J. Oper. Res., 205, 1, 1-18 (2010) · Zbl 1188.90110
[33] Shih, H.-Y.; Huang, J.-J.; Leu, J.-S., Dynamic slot-based task scheduling based on node workload in a MapReducecomputation model, Proceedings of the International Conference on Anti-Counterfeiting, Security and Identification (ASID), 1-5 (2012), IEEE
[34] Thusoo, A.; Shao, Z.; Anthony, S.; Borthakur, D.; Jain, N.; Sen Sarma, J.; Murthy, R.; Liu, H., Data warehousing and analytics infrastructure at facebook, Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1013-1020 (2010), ACM
[35] Tian, C.; Zhou, H.; He, Y.; Zha, L., A dynamic MapReduce scheduler for heterogeneous workloads, Proceedings of the Eighth International Conference on Grid and Cooperative Computing, GCC’09, 218-224 (2009), IEEE
[36] Verma, A.; Cherkasova, L.; Campbell, R. H., Play it again, SimMR!, Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER, 253-261 (2011), IEEE
[37] Verma, A.; Cherkasova, L.; Campbell, R. H., Resource provisioning framework for MapReduce jobs with performance goals, (Kon, F.; Kermarrec, A.-M., Lecture Notes in Computer Science, vol. 7049 (2011), Springer), 165-186
[38] Verma, A.; Cherkasova, L.; Campbell, R. H., Orchestrating an ensemble of MapReduce jobs for minimizing their makespan, IEEE Trans. Dependable Secure Comput., 10, 5, 314-327 (2013)
[39] Wang, Y.; Shi, W., Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds, IEEE Trans. Cloud Comput., 2, 3, 306-319 (2014)
[40] White, T., Hadoop: The Definitive Guide (2009), O’Reilly Media
[41] Wolf, J.; Rajan, D.; Hildrum, K.; Khandekar, R.; Kumar, V.; Parekh, S.; Wu, K.-L.; Balmin, A., FLEX: a slot allocation scheduling optimizer for MapReduce workloads, Proceedings of Middleware Conference, 1-20 (2010), Springer
[42] Zaharia, M.; Borthakur, D.; Sarma, J. S.; Elmeleegy, K.; Shenker, S.; Stoica, I., Job Scheduling for Multi-user MapReduce Clusters, Technical Report UCB/EECS-2009-55 (2009), EECS Department, University of California, Berkeley
[43] Zaharia, M.; Borthakur, D.; Sen Sarma, J.; Elmeleegy, K.; Shenker, S.; Stoica, I., Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling, Proceedings of the EuroSys 2010 Conference, EuroSys’10, 265-278 (2010)
[44] Zaharia, M.; Konwinski, A.; Joseph, A. D.; Katz, R. H.; Stoica, I., Improving MapReduce performance in heterogeneous environments., Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, 29-42 (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.