Parameter Settings Optimization in MapReduce Big Data processing using the MOPSO Algorithm


  • Lennah Etyang Jomo Kenyatta University of Agriculture and Technology, Kenya
  • Lawrence Nderu Jomo Kenyatta University of Agriculture and Technology, Kenya
  • Waweru Mwangi Jomo Kenyatta University of Agriculture and Technology



Multi Objective Problem, MOPSOPITCH, MapReduce, Pareto Optimality, Parallel Computing Toolbox


Big data is a commodity that is highly valued in the entire globe. It is not just regarded as data but in the world of experts, we can derive intelligence from it. Because of its characteristics which are Variety, Value, Volume, Velocity, and the growing need of how it can be handled, Organizations are facing difficulties in ensuring optimal as well as affordable processing and storage of large datasets. One of the already existing models used for rapid processing together with storage in big data is known as Hadoop MapReduce.  MapReduce is used for large-scale data processing in a parallel and distributed computing environment, while Hadoop is used for running applications and storing data in clusters of commodity hardware Furthermore, the Hadoop MapReduce framework needs to tune more than 190 configuration parameters which are mostly done manually. Due to complex interactions and large spaces between parameters, manual tuning is not effective. Even worse, these parameters must be tuned every time Hadoop MapReduce applications are run. The main goal of this research is to create an algorithm that will improve efficiency by automatically optimizing parameter settings when MapReduce jobs are running. The algorithm employs the Multi-Objective Particle Swarm Optimization (MOPSO) technique, which uses two objective functions to look for a Pareto optimal solution while optimizing the parameters. The results of the experiments have shown that the algorithm has remarkably improved MapReduce job performance in comparison to the use of default settings.


J. M. Cavanillas, E. Curry, and W. Wahlster, “New Horizons for a DataDriven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe,” New Horizons a Data-Driven Econ. A Roadmap Usage Exploit. Big Data Eur., pp. 1–303, 2016.

K. N. Aye, “A Platform for Big Data Analytics on Distributed Scaleout Storage System A Platform for Big Data Analytics on Distributed Scale-out Storage System Kyar Nyo Aye University of Computer Studies , Yangon A thesis submitted to the University of Computer Studi,” no. November, 2015.

N. Francis and S. Kurian K, “Data Processing for Big Data Applications using Hadoop Framework,” Ijarcce, no. April, pp. 177–180, 2015.

I. Technologies, “Map Reduce a Programming Model for Cloud Computing Based On Hadoop Ecosystem,” vol. 5, no. 3, pp. 3794–3799, 2014.

Q. Lu, Z. Li, M. Kihl, L. Zhu, and W. Zhang, “CF4BDA: A Conceptual Framework for Big Data Analytics Applications in the Cloud,” IEEE Access, vol. 3, no. March 2017, pp. 1944–1952, 2015.

M. Alam and K. Ara Shakil, “Big Data Analytics in Cloud environment using Hadoop Mansaf Alam and Kashish Ara Shakil Department of Computer Science, Jamia Millia Islamia, New Delhi,” Dep. Comput. Sci. Jamia Millia Islam. New Delhi.

N. Rajyaguru and M. Vinay, “A Comparative Study of Big Data on Mobile Cloud Computing,” Indian J. Sci. Technol., vol. 10, no. 21, pp. 1–10, 2017.

Yang, G. (2011). The Application of MapReduce in the Cloud Computing. 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing, Hubei,, 154-156.

S. Daneshyar, “Large-Scale Data Processing Using MapReduce in Cloud Computing Environment,” Int. J. Web Serv. Comput., vol. 3, no. 4, pp. 1–13, 2012.

Voruganti, S. (2014). Map Reduce a Programming Model for Cloud Computing Based On Hadoop Ecosystem . (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (3) , 2014, 3794-3799, 3794-3799.

Guangdeng Liao, K. D. (2013). Gunther: Search-Based Auto-Tuning of MapReduce. Part of the Lecture Notes in Computer Science book series (LNCS, volume 8097). European Conference on Parallel Processing, 406-419.

M. Li et al., “Mronline: MapReduce online performance tuning,” HPDC 2014 - Proc. 23rd Int. Symp. High-Performance Parallel Distrib. Comput., pp. 165–176, 2014.

M. Khan, Z. Huang, M. Li, G. A. Taylor, and M. Khan, “Optimizing hadoop parameter settings with gene expression programming guided PSO,” Concurr. Comput. , vol. 29, no. 3, pp. 1–21, 2017.

A. Britto and A. Pozo, “I-MOPSO: A suitable PSO algorithm for many-objective optimization,” Proc. - Brazilian Symp. Neural Networks, SBRN, pp. 166–171, 2012.

C. A. Coello Coello and M. S. Lechuga, “MOPSO: A proposal for multiple objective particle swarm optimization,” Proc. 2002 Congr. Evol. Comput. CEC 2002, vol. 2, pp. 1051–1056, 2002.

Kennedy J, Eberhart R. Particle swarm optimization. In Proceedings., IEEE International Conference on Neural Networks 1995. 1995; 4:1942–1948.

A. W. McNabb, C. K. Monson, and K. D. Seppi, “Parallel PSO using MapReduce,” 2007 IEEE Congr. Evol. Comput. CEC 2007, vol. 15213, pp. 7–14, 2007.

J. Narayan and S. Shetty, “Handling Big Data Analytics Using Swarm Intelligence,” vol. 2, no. 6, pp. 271–275, 2017.

W. Hu, G. G. Yen, and X. Zhang, “Multiobjective particle swarm optimization based on Pareto entropy,” Ruan Jian Xue Bao/Journal Softw., vol. 25, no. 5, pp. 1025–1050, 2014.

J. Leiva, R. C. Pardo, and J. A. Aguado, “Data analytics-based multiobjective particle swarm optimization for determination of congestion thresholds in LV networks,” Energies, vol. 12, no. 7, 2019.

T. Li and B. Yang, “A review of multi-objective particle swarm optimization algorithms in power system economic dispatch,” Int. J. Simul. Syst. Sci. Technol., vol. 17, no. 27, pp. 15.1-15.5, 2016.

G. Lamont, Evolutionary Algorithms for Solving Multi-Objective Problems, no. May. 2007.

M. Pospelova, “Real Time Autotuning for MapReduce on Hadoop/YARN,” 2015.

J. Zhang, D. Xiang, T. Li, and Y. Pan, “M2M: A simple Matlab-toMapReduce translator for cloud computing,” Tsinghua Sci. Technol., vol. 18, no. 1, pp. 1–9, 2013.

S. Mehrjoo and S. Dehghanian, “Mapreduce Based Particle Swarm Optimization for Large Scale Problems,” AICS 2015 Proceeding 3rd Int. Conf. Artif. Intell. Comput. Sci., no. October, pp. 12–13, 2015.

X. Yong, C. Ying, and F. Yanjun, “Research on cloud computing and its application in big data processing of railway passenger flow,” Chem. Eng. Trans., vol. 46, no. 2011, pp. 325–330, 2015.

S. Lalwani, H. Sharma, S. Chandra, S. Kusum, D. Jagdish, and C. Bansal, “REVIEW - COMPUTER ENGINEERING AND COMPUTER SCIENCE A Survey on Parallel Particle Swarm Optimization Algorithms,” Arab. J. Sci. Eng., 2019.

P. M. Roth and M. Winter, “Compiling MATLAB M-Files for Usage Within an MATLAB Compiler mcc,” pp. 1–21, 2004.

E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: empirical results.,” Evol. Comput., vol. 8, no. 2, pp. 173–195, 2000.

W. J. Lim, A. B. Jambek, and S. C. Neoh, “Kursawe and ZDT functions optimization using hybrid micro genetic algorithm (HMGA),” Soft Comput., vol. 19, no. 12, pp. 3571–3580, 2015.

S. Lalwani, S. Singhal, R. Kumar, and N. Gupta, “a Comprehensive Survey: Applications of Multi-Objective Particle Swarm Optimization (Mopso) Algorithm,” Trans. Comb. ISSN, vol. 2, no. 1, pp. 2251–8657, 2013.

S. Das, S. S. Mullick, and P. N. Suganthan, “Recent advances in differential evolution-An updated survey,” Swarm Evol. Comput., vol. 27, pp. 1–30, 2016.

K. A. Venkatesh, K. Neelamegam, and R. Revathy, “Using MapReduce and load balancing on the cloud: Hadoop MapReduce and virtualization improves node performance Cloud architecture,” no. July, pp. 1–10, 2010.

L. Bao, X. Liu, and W. Chen, “Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks,” Proc. - 2018 IEEE Int. Conf. Big Data, Big Data 2018, pp. 181–190, 2019.

M. Khan, “Hadoop Performance Modeling and Job Optimization for Big Data Analytics,” Brunel Univ. London, no. March, p. 157, 2015.

S. Cheng, Q. Zhang, and Q. Qin, “Big data analytics with swarm intelligence,” Ind. Manag. Data Syst., vol. 116, no. 4, pp. 646–666, 2016.

@bookMATLAB:2020,year = 2020,author = MATLAB,title = version 7.10.0 (R2020a),publisher = The MathWorks Inc.,address = Natick, Massachusetts.

P. V. Raja and E. Sivasankar, “Modern framework for distributed healthcare data analytics based on hadoop,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8407 LNCS, pp. 348–355, 2014.

Medel, V., Rana, O., Banares, J. A., Arronategui, U. (2016). Modelling performance and resource management in Kubernetes. Proceedings - 9th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2016, (September 2019), 257–262.



How to Cite

Etyang, L., Lawrence Nderu, & Waweru Mwangi. (2021). Parameter Settings Optimization in MapReduce Big Data processing using the MOPSO Algorithm. International Journal of Advances in Scientific Research and Engineering, IJASRE (ISSN: 2454 - 8006), 7(4), 31-43.