Paper Key : IRJ************141
Author: Hanagal Atharva Anil,Sudha V Pareddy,Kundile Prasad Nivarthi,Manoj Kumar
Date Published: 07 Nov 2024
Abstract
Speech Processing is the study and manipulation of Speech Transmitted signal as a form of Communication to extract meaningful information from it. This field applied dedicated algorithms, particularly neural networks for various tasks like translating one language to another while keeping in mind of the local context that makes it meaningful. Text-to speech (abbreviated TTS) is a technology that converts written text into spoken words. Text to speech works with machine generated voices, imitating human sound in high-fidelity. This includes our complex linguistic models for pronouncing words, intonation and rhythm that makes the output sound like a human voice. In contrast, speech-to-text (STT) models convert spoken language to written text, enabling the accurate representation of a range of accents and dialects. This entails intricate acoustic modeling and natural language processing efforts to ensure accurate transcription. The system of the present invention is to provide a multilingual voice translation and synthesis system using advanced speech recognition technologies such as Hidden Markov Models (HMM), Recurrent Neural Networks (RNN),Deep Neural Networks (DNN) which allows conversation in multiple languages to be translated into another simultaneously. While HMMs are used to model sequences of sounds in speech in a probabilistic way, RNNs (and similarly DNNs) can capture long-term dependencies and learn from large datasets useful for the parsing structure and complexities of human language. This integration provides accurate and flexible speech recognitionsynthesis for speaking many languagesdialects, dealing with pronunciation, grammar and context subtleties.
DOI Requested