Empirical Study of Job Scheduling Algorithms in Hadoop MapReduce

Jyoti V. Gautam; Harshadkumar B. Prajapati; Vipul K. Dabhi; Sanjay Chaudhary

doi:10.1515/cait-2017-0012

.blurhash-client-img { display: none !important; }

Empirical Study of Job Scheduling Algorithms in Hadoop MapReduce

Cybernetics and Information Technologies

Volume 17 (2017): Issue 1 (March 2017)

By: Jyoti V. Gautam, Harshadkumar B. Prajapati, Vipul K. Dabhi and Sanjay Chaudhary

Open Access

|Apr 2017

Abstract

Several Job scheduling algorithms have been developed for Hadoop-Map Reduce model, which vary widely in design and behavior for handling different issues such as locality of data, user share fairness, and resource awareness. This article focuses on empirically evaluating the performance of three schedulers: First In First Out (FIFO), Fair scheduler, and Capacity scheduler. To carry out the experimental evaluation, we implement our own Hadoop cluster testbed, consisting of four machines, in which one of the machines works as the master node and all four machines work as slave nodes. The experiments include variation in data sizes, use of two different data processing applications, and variation in the number of nodes used in processing. The article analyzes the performance of the job scheduling algorithms based on various relevant performance measures. The results of the experiments are evident of the performance being affected by the job scheduling parameters, the type of applications, the number of nodes in the cluster, and size of the input data.

References

1. Bardhan, S., D. A. Menascé. The Anatomy of Mapreduce Jobs, Scheduling, and Performance Challenges. - In: Proc. of 2013 Conference of the Computer Measurement Group, 2013.
Search in Google Scholar Back to article
2. Apache Hadoop. Last accessed on 15 April 2016. http://hadoop.apache.org
Search in Google Scholar Back to article
3. Shilpa, M. K. Big Data Visualization Tool with Advancement of Challenges. - International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 4, 2014, No 3, pp. 665-668.
Search in Google Scholar Back to article
4. Davis, R. I., A. Burns. A Survey of Hard Real-Time Scheduling for Multiprocessor Systems. - ACM Computing Surveys (CSUR), ACM, Vol. 43, 2011, No 4, p. 35.10.1145/1978802.1978814
Search in Google Scholar Back to article
5. Herodotou, H., S. Babu. Profiling, What-if Analysis, and Cost-Based Optimization of Mapreduce Programs. - In: Proc. of VLDB Endowment, 2011, pp. 1111-1122.10.14778/3402707.3402746
Search in Google Scholar Back to article
6. Wang, G., A. R. Butt, P. Pandey, K. Gupta. A Simulation Approach to Evaluating Design Decisions in Mapreduce Setups. - In: MASCOTS, 2009, pp. 1-11.
Search in Google Scholar Back to article
7. Zaharia, M., D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, I. Stoica, Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. - In: Proc. of 5th European Conference on Computer Systems, ACM, 2010, pp. 265-278.
Search in Google Scholar Back to article
8. Liu, S., J. Xu, Z. Liu, X. Liu. Evaluating Task Scheduling in Hadoop-Based Cloud Systems. - In: 2013 IEEE International Conference on Big Data, IEEE, 2013, pp. 47-53.10.1109/BigData.2013.6691697
Search in Google Scholar Back to article
9. Gautam, J. V., H. B. Prajapati, V. K. Dabhi, S. Chaudhary. A Survey on Job Scheduling Algorithms in Big Data Processing. - In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), IEEE, 2015, pp. 1-11.10.1109/ICECCT.2015.7226035
Search in Google Scholar Back to article
10. Zaharia, M. Hadoop Fair Scheduler Design Document, August 15 2009. Last accessed on 15 April 2016. http://svn.apache.org/repos/asf/hadoop/common/branches/MAPREDUCE-233/src/contrib/fairscheduler/designdoc/fair_scheduler_design_doc.pdf.
Search in Google Scholar Back to article
11. Wang, D., J. Chen, W. Zhao. A Task Scheduling Algorithm for Hadoop Platform. - Journal of Computers, Vol. 8, 2013, No 4, pp. 929-936.10.4304/jcp.8.4.929-936
Search in Google Scholar Back to article
12. Gu, R., X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan, Y. Huang. Shadoop: Improving Mapreduce Performance by Optimizing Job Execution Mechanism in Hadoop Clusters. - Journal of Parallel and Distributed Computing, Vol. 74, 2014, No 3, pp. 2166-2179.10.1016/j.jpdc.2013.10.003
Search in Google Scholar Back to article
13. Anjos, J. C., I. Carrera, W. Kolberg, A. L. Tibola, L. B. Arantes, C. R. Geyer. Mra++: Scheduling and Data Placement on Mapreduce for Heterogeneous Environments. - Future Generation Computer Systems, Vol. 42, 2015, pp. 22-35.10.1016/j.future.2014.09.001
Search in Google Scholar Back to article
14. Ling, X., Y. Yuan, D. Wang, J. Liu, J. Yang. Joint Scheduling of Mapreduce Jobs with Servers: Performance Bounds and Experiments. - Journal of Parallel and Distributed Computing, Elsevier, Vol. 90, 2016, pp. 52-66.10.1016/j.jpdc.2016.02.002
Search in Google Scholar Back to article
15. Li, X., T. Jiang, R. Ruiz. Heuristics for Periodical Batch Job Scheduling ina Mapreduce Computing Framework. - Information Sciences, Elsevier, Vol. 326, 2016, pp. 119-133.10.1016/j.ins.2015.07.040
Search in Google Scholar Back to article
16. Mashayekhy, L., M. M. Nejad, D. Grosu, Q. Zhang, W. Shi. Energy-Aware Scheduling of Mapreduce Jobs for Big Data Applications. - IEEE Transactions on Parallel and Distributed Systems, IEEE, Vol. 26, 2015, No 10, pp. 2720-2733.10.1109/TPDS.2014.2358556
Search in Google Scholar Back to article
17. Tang, Z., J. Zhou, K. Li, R. Li. A Mapreduce Task Scheduling Algorithm for Deadline Constraints. - Cluster Computing, Springer, Vol. 16, 2013, No 4, pp. 651-662.10.1007/s10586-012-0236-5
Search in Google Scholar Back to article
18. Wang, Y., W. Shi. Budget-Driven Scheduling Algorithms for Batches of Mapreduce Jobs in Heterogeneous Clouds. - IEEE Transactions on Cloud Computing, IEEE, Vol. 2, 2014, No 3, pp. 306-319.10.1109/TCC.2014.2316812
Search in Google Scholar Back to article
19. Liu, Y., W. Wei. A Replication-Based Mechanism for Fault Tolerance in Mapreduce Framework. – Mathematical Problems in Engineering, Hindawi Publishing Corporation, Vol. 2015, 2015.10.1155/2015/408921
Search in Google Scholar Back to article
20. Chen, Q., M. Guo, Q. Deng, L. Zheng, S. Guo, Y. Shen. Hat: History-Based Auto-Tuning Mapreduce in Heterogeneous Environments. – The Journal of Supercomputing, Springer, Vol. 64, 2013, No 3, pp. 1038-1054.10.1007/s11227-011-0682-5
Search in Google Scholar Back to article
21. Gunarathne, T., B. Zhang, T.-L. Wu, J. Qiu. Scalable Parallel Computing on Clouds Using Twister4azure Iterative Mapreduce. – Future Generation Computer Systems, Elsevier, Vol. 29, 2013, No 4, pp. 1035-1048.10.1016/j.future.2012.05.027
Search in Google Scholar Back to article
22. Sun, M., H. Zhuang, C. Li, K. Lu, X. Zhou. Scheduling Algorithm Based on Prefetching in Mapreduce Clusters. – Applied Soft Computing, Elsevier, Vol. 38, 2016, pp. 1109-1118.10.1016/j.asoc.2015.04.039
Search in Google Scholar Back to article
23. Sehrish, S., G. Mackey, P. Shang, J. Wang, J. Bent. Supporting hpc Analytics Applications with Access Patterns Using Data Restructuring and Data-Centric Scheduling Techniques in Mapreduce. – IEEE ransactions on Parallel and Distributed Systems, IEEE, Vol.
Search in Google Scholar Back to article
24, 2013, No 1, pp. 158-169. 24. Wang, L., J. Tao, R. Ranjan, H. Marten, A. Streit, J. Chen, D. Chen. G-Hadoop: Mapreduce Across Distributed Data Centers for Data-Intensive Computing. – Future Generation Computer Systems, Elsevier, Vol. 29, 2013, No 3, pp. 739-750.10.1016/j.future.2012.09.001
Search in Google Scholar Back to article
25. Tiwari, N., S. Sarkar, U. Bellur, M. Indrawan. Classification Framework of Mapreduce Scheduling Algorithms. – ACM Computing Surveys (CSUR), ACM, Vol. 47, 2015, No 3, p. 49.10.1145/2693315
Search in Google Scholar Back to article
26. Li, F., B. C. Ooi, M. T. Özsu, S. Wu. Distributed Data Management Using Mapreduce. – ACM Computing Surveys (CSUR), ACM, Vol. 46, 2014, No 3, p. 31.10.1145/2503009
Search in Google Scholar Back to article
27. Lee, K.-H., Y.-J. Lee, H. Choi, Y. D. Chung, B. Moon. Parallel Data Processing with Mapreduce: A Survey. – Ac Ms IGMo D Record, ACM, Vol. 40, 2012, No 4, pp. 11-20.10.1145/2094114.2094118
Search in Google Scholar Back to article
28. Inacio, E. C., M. A. Dantas. A Survey into Performance and Energy Efficiency in hpc, Cloud and Big Data Environments. – International Journal of Networking and Virtual Organisations, Inderscience Publishers, Vol. 14, 2014, No 4, pp. 299-318.10.1504/IJNVO.2014.067878
Search in Google Scholar Back to article
29. Althebyan, Q., Y. Jararweh, Q. Yaseen, O. Al Qudah, M. Al- Ayyoub. Evaluating Map Reduce Tasks Scheduling Algorithms over Cloud Computing Infrastructure. – Concurrency and Computation: Practice and Experience, Wiley Online Library, Vol. 27, 2015, No 18, pp. 5686-5699.10.1002/cpe.3595
Search in Google Scholar Back to article
30. Jia, Z., R. Zhou, C. Zhu, L. Wang, W. Gao, Y. Shi, J. Zhan, L. Zhang. The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems. – In: Specifying Big Data Benchmarks, Springer, 2014, pp. 44-59.10.1007/978-3-642-53974-9_5
Search in Google Scholar Back to article
31. He, C., Y. Lu, D. Swanso n. Matchmaking: A New Mapreduce Scheduling Technique. – In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (Cloud Com), IEEE, 2011, pp. 40-47.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.1515/cait-2017-0012 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702

Journal RSS Feed

Language: English

Page range: 146 - 163

Published on: Apr 6, 2017

Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies

In partnership with: Paradigm Publishing Services

Keywords:

Hadoop,

experimental evaluation

Related subjects:

Computer sciences,

Information technology

© 2017 Jyoti V. Gautam, Harshadkumar B. Prajapati, Vipul K. Dabhi, Sanjay Chaudhary, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 17 (2017): Issue 1 (March 2017)