Optimize Large-Scale Data Processing Via Spark Tuning by  Apurva Kumar,Shilpa Priyadarshini

IRJMETS  Apurva Kumar,Shilpa Priyadarshini

Paper Key : IRJ************021

Author: Apurva Kumar,Shilpa Priyadarshini

Date Published: 27 Oct 2023

Abstract

Apache Spark is a leading open-source data processing engine used for batch processing, machine learning, stream processing, and large-scale SQL (structured query language). It has been designed to make big data processing quicker and easier. Since its inception, Spark has gained huge popularity as a big data processing framework and is extensively used by different industries and businesses that are dealing with large volumes of data. This paper will exhibit actionable solutions to maximize our chances of reducing computation time by optimizing Spark jobs. The strategy lays out different run stages, wherein each run stage builds upon the previous and improves the computation time by making new enhancements and recommendations.

DOI LINK : 10.56726/IRJMETS45567 https://www.doi.org/10.56726/IRJMETS45567

Paper File to download :

INTERNATIONAL RESEARCH JOURNAL OF MODERNIZATION IN ENGINEERING TECHNOLOGY AND SCIENCE

Optimize Large-Scale Data Processing Via Spark Tuning