Paper Key : IRJ************957
Author: Premsai Paruchuri,Gudaboina Lalith,Saganti Akhila
Date Published: 01 Apr 2025
Abstract
Voice recognition technology has transformed our interactions with digital devices, enabling features like virtual assistants, transcription, and security systems. Traditionally, voice recognition relied on hand-engineered features and classical machine learning, which struggled with variations in audio quality and speaker characteristics. Transformer-based models, such as Wav2Vec2, now offer advanced solutions by capturing long-range dependencies in sequential data and learning directly from raw audio. This system utilizes Wav2Vec2 for audio feature extraction and a custom transformer-based classifier to identify singers in audio recordings. Data augmentation techniques enhance the models robustness across diverse audio conditions. The system achieved high accuracy, highlighting its potential applications in music streaming, copyright management, and audio content organization. This work demonstrates the transformative capabilities of transformers in audio processing and points toward future advancements in voice-based technologies.
DOI Requested