Development of a Knowledge Discovery System in Big Data Mining Environment

Abstract

This paper focused on the development of knowledge discovery system in big data mining environment. In order to carry out the aim of the work, the paper developed a knowledge discovery system in Big Data Mining Environment that could sift through large amounts of data to find previously hidden patterns, discover valuable new insights and make decisions; apply the dynamics involved in big data technologies and use of distributed data storage and analysis architecture of Hadoop MapReduce; conduct performance benchmarking on Relational Database Management System (RDBMS) and Hadoop cluster, create value in several ways and improve performances. The analytic environment provided a powerful in database algorithms and open source algorithms to enable predictive analytics, data mining, statistical analysis, advanced numerical computations and interactive graphics. Automated analysis of historical data were performed by employing Knowledge Discovery and Data mining (KDD) using Map Reduce Methodology and Predictive Analytic Methodology. The Euclidean distance and the pseudo F‑statistic validated Hadoop’s high scalability and performance in the real time applications domain, minimized data movement thereby ensuring inherent security and better performance. The result showed that a model for big data mining environment was realized which provided an open source framework for cloud computing and distributed file system for fast data loading. 

Country : Nigeria

1 Ihekeremma A. U. Ejimofor2 Obi O.R. Okonkwo

  1. Department of Computer Science, NnamdiAzikiwe University, Awka, Nigeria
  2. Department of Computer Science, NnamdiAzikiwe University, Awka, Nigeria

IRJIET, Volume 5, Issue 8, August 2021 pp. 65-70

doi.org/10.47001/IRJIET/2021.508011

References

  1. Hand, A., Niu, F., and Ré, C. Hazy, “Making It Easier to Build and Maintain Big Data Analytics.” Communications of the ACM, vol. 8 no.  5, pp. 40-49, 2013.
  2. Kobielus J., “Parallel database systems: The future of high performance database systems,'' Commun. The ACM, vol. 35, no. 6, pp. 85-98, 2014.
  3. Larose Daniel T. and Larose Chantal D., “Discovering Knowledge in Data: An Introduction to Data Mining,” pp. 4-5, 2015.
  4. Larose Daniel T. and. Larose Chantal D., “Data Mining and Predictive Analysis,” pp. 56-67, 2015.
  5. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., “Big Bench: Towards an Industry Standard Benchmark for Big Data Analytics.”ACM SIGMODInt. Conf., pp. 1197- 1208, 2017.
  6. Narang, A., Srivastava, A., and Katta, N. “High Performance Offline and Online Distributed Collaborative Filtering.” The 12th International Conference on Data Mining (ICDM), pp. 549-558, 2013.
  7. Dean J. and GhemawatS., “MapReduce: simplified data processing on large clusters,” in Proceedings of the 6th Symposium on Operating Systems Design & Implementation, vol. 3, pp. 102–111, 2014.
  8. Russom, P. “Managing Big Data.” Available Online at: The Data Warehousing Institute, 2013. https://tdwi.org/articles/2013/10/01/executivesummary‑managing‑big‑data.aspx