Have a personal or library account? Click to login
Performance Optimization System for Hadoop and Spark Frameworks Cover

Performance Optimization System for Hadoop and Spark Frameworks

Open Access
|Dec 2020

Abstract

The optimization of large-scale data sets depends on the technologies and methods used. The MapReduce model, implemented on Apache Hadoop or Spark, allows splitting large data sets into a set of blocks distributed on several machines. Data compression reduces data size and transfer time between disks and memory but requires additional processing. Therefore, finding an optimal tradeoff is a challenge, as a high compression factor may underload Input/Output but overload the processor. The paper aims to present a system enabling the selection of the compression tools and tuning the compression factor to reach the best performance in Apache Hadoop and Spark infrastructures based on simulation analyzes.

DOI: https://doi.org/10.2478/cait-2020-0056 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 5 - 17
Submitted on: Jul 6, 2020
Accepted on: Sep 25, 2020
Published on: Dec 31, 2020
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2020 Hrachya Astsatryan, Aram Kocharyan, Daniel Hagimont, Arthur Lalayan, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.