Paper Key : IRJ************021
Author: Apurva Kumar,Shilpa Priyadarshini
Date Published: 27 Oct 2023
Abstract
Apache Spark is a leading open-source data processing engine used for batch processing, machine learning, stream processing, and large-scale SQL (structured query language). It has been designed to make big data processing quicker and easier. Since its inception, Spark has gained huge popularity as a big data processing framework and is extensively used by different industries and businesses that are dealing with large volumes of data. This paper will exhibit actionable solutions to maximize our chances of reducing computation time by optimizing Spark jobs. The strategy lays out different run stages, wherein each run stage builds upon the previous and improves the computation time by making new enhancements and recommendations.
DOI LINK : 10.56726/IRJMETS45567 https://www.doi.org/10.56726/IRJMETS45567