Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study

Abdiaziz Omar Hassan; Abdulkadir Abdulahi Hasan

doi:doi:10.11648/j.aas.20210603.11

| Peer-Reviewed

Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan

Published in Advances in Applied Sciences (Volume 6, Issue 3)

Received: 29 May 2021 Accepted: 21 June 2021 Published: 9 July 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.

Published in	Advances in Applied Sciences (Volume 6, Issue 3)
DOI	10.11648/j.aas.20210603.11
Page(s)	43-48
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Google MapReduce Processes, Hadoop, Parallel Data Processing, HDFS, Cloud Computing, Large Cluster Data Processing

References

[1]	G. Z. &. C. B. Jason R Swedlow, "Channeling the data deluge," Nature methods, vol. 8, p. 463–465, 2011.
[2]	J. Maitrey S, "An Integrated Approach for CURE Clustering using Map-Reduce Techniques," In Proceedings of Elsevier, vol. 2, 2013.
[3]	D. D, "MapReduce: A major step backwards," The Database Column, 2011.
[4]	Y. Kim and K. Shim, "Parallel Top-K Similarity Join Algorithms Using MapReduce," Arlington, VA, USA, 2012.
[5]	J. Shafer, S. Rixner and A. L. Cox, "The Hadoop distributed filesystem: Balancing portability and performance," White Plains, NY, USA, 2010.
[6]	S. M. CA Moturi, "Use of MapReduce for Data Mining and Data Optimization on a Web Portal," International Journal of Computer, vol. 56, no. 7, 2012.
[7]	C. J. Seema Maitreya, "MapReduce: Simplified Data Analysis of Big Data," Procedia Computer Science, vol. 57, pp. 563-571, 2015.
[8]	S. G. Jeffrey Dean, "MapReduce: Simplified Data Processing on Large Clusters," USENIX Association OSDI, vol. 4, pp. 137-149, 2004.
[9]	R. M. Yoo, A. Romano and C. Kozyrakis, "Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system," Austin, TX, USA, 2009.
[10]	H. C. Y. D. C. B. M. Kyong-Ha Lee, "Parallel data processing with MapReduce: a survey," ACM SIGMOD Record, vol. 40, no. 4, 2012.
[11]	B. P. J. S. H. S. B. R. J. Bayardo, "PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce," PVLDB, vol. 2, no. 2, pp. 1426-1437, 2009.
[12]	S. G. Jeffrey Dean, "MapReduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1, 2008.
[13]	J. Ekanayake, S. Pallickara and G. Fox, "MapReduce for Data Intensive Scientific Analyses," Indianapolis, IN, USA, 2008.
[14]	A. Alam and J. Ahmed, "Hadoop Architecture and Its Issues," Las Vegas, NV, USA, 2014.
[15]	R. K. R. R. Vijayakumari, "Comparative analysis of Google File System and Hadoop Distributed File System," International Journal of Advanced Trends in Computer Science and Engineering, vol. 3, no. 1, pp. 553-558, 2014.
[16]	J. J. B. X. Y. F. Wang, "Hadoop high availability through metadata replication”, in Proc," The first international workshop on Cloud data management, pp. 37-44, 2009.
[17]	A. D. R.-L. H. D. S. P. Hung-chih Yang, "Map-reduce-merge: simplified relational data processing on large clusters," 2007.

Cite This Article

Plain Text BibTeX RIS

APA Style

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. (2021). Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Advances in Applied Sciences, 6(3), 43-48. https://doi.org/10.11648/j.aas.20210603.11

Copy | Download

ACS Style

Abdiaziz Omar Hassan; Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv. Appl. Sci. 2021, 6(3), 43-48. doi: 10.11648/j.aas.20210603.11

Copy | Download

AMA Style

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv Appl Sci. 2021;6(3):43-48. doi: 10.11648/j.aas.20210603.11

Copy | Download

@article{10.11648/j.aas.20210603.11,
  author = {Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan},
  title = {Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study},
  journal = {Advances in Applied Sciences},
  volume = {6},
  number = {3},
  pages = {43-48},
  doi = {10.11648/j.aas.20210603.11},
  url = {https://doi.org/10.11648/j.aas.20210603.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.aas.20210603.11},
  abstract = {With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.},
 year = {2021}
}

Copy | Download

TY - JOUR
T1 - Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study
AU - Abdiaziz Omar Hassan
AU - Abdulkadir Abdulahi Hasan
Y1 - 2021/07/09
PY - 2021
N1 - https://doi.org/10.11648/j.aas.20210603.11
DO - 10.11648/j.aas.20210603.11
T2 - Advances in Applied Sciences
JF - Advances in Applied Sciences
JO - Advances in Applied Sciences
SP - 43
EP - 48
PB - Science Publishing Group
SN - 2575-1514
UR - https://doi.org/10.11648/j.aas.20210603.11
AB - With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.
VL - 6
IS - 3
ER -

Copy | Download

Author Information

Abdiaziz Omar Hassan

College of Mathematics and Big Data, Anhui University of Science and Technology, Huainan, China
Abdulkadir Abdulahi Hasan

College of Mathematics and Big Data, Anhui University of Science and Technology, Huainan, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. (2021). Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Advances in Applied Sciences, 6(3), 43-48. https://doi.org/10.11648/j.aas.20210603.11

Copy | Download

ACS Style

Abdiaziz Omar Hassan; Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv. Appl. Sci. 2021, 6(3), 43-48. doi: 10.11648/j.aas.20210603.11

Copy | Download

AMA Style

Abdiaziz Omar Hassan, Abdulkadir Abdulahi Hasan. Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study. Adv Appl Sci. 2021;6(3):43-48. doi: 10.11648/j.aas.20210603.11

Copy | Download

@article{10.11648/j.aas.20210603.11,
  author = {Abdiaziz Omar Hassan and Abdulkadir Abdulahi Hasan},
  title = {Simplified Data Processing for Large Cluster: A MapReduce and Hadoop Based Study},
  journal = {Advances in Applied Sciences},
  volume = {6},
  number = {3},
  pages = {43-48},
  doi = {10.11648/j.aas.20210603.11},
  url = {https://doi.org/10.11648/j.aas.20210603.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.aas.20210603.11},
  abstract = {With the drastic development of computing technologies, there is an ever-increasing trend in the growth of data. Data scientists are overwhelmed with such a large and ever-increasing amount of data, as this now requires more processing channels. The big concern arising here for large-scale data is to provide support for the decision making process. Here in this study, the MapReduce programming model is applied, an associated implementation introduced by Google. This programming model involves the computation of two functions; Map and Reduce. The MapReduce libraries automatically parallelize the computation and handle complex tasks including big data distribution, loads and fault tolerance. This MapReduce implementation with the source formation of Google and the open-source mechanism, Hadoop has an objective of handling computation of large clusters of commodities. Our implication of MapReduce and Hadoop framework is aimed at discussing terabytes and petabytes of storage with thousands of machines parallel to every machine and process at identical times. This way, large processing and manipulation of big data are maintained with effective result orientations. This study will show up the basics of MapReduce programming and open-source Hadoop structure application. The Hadoop system can speed up the handling of big data and respond very fast.},
 year = {2021}
}

Copy | Download