Speech-to-text system for multiple languages - Orisys Academy for Skill Development & Research

By harish hv on 19th January 2024

Problem statement

Language barriers pose a challenge in communication, especially in contexts where diverse languages are spoken. Existing speech-to-text systems often specialize in a single language, limiting their applicability in multilingual environments.

Abstract

The Speech-to-Text System for Multiple Languages project addresses language diversity by developing a system capable of converting spoken words into text across various languages. Through advanced natural language processing (NLP) techniques, the system aims to provide accurate and real-time transcriptions, fostering effective communication in multilingual settings.

Outcome

The outcome of this project is a versatile and adaptive speech-to-text system that supports multiple languages. Users can seamlessly convert spoken words into text, facilitating communication across language barriers. The system’s accuracy and flexibility make it valuable in diverse contexts, such as international conferences, language learning platforms, and accessibility tools, contributing to improved cross-cultural communication.

Reference

The current work presents a multilingual speech-to-text conversion system. Conversion is based on information in speech signal. Speech is the natural and most important form of communication for human being. Speech-To-Text (STT) system takes a human speech utterance as an input and requires a string of words as output. The objective of this system is to extract, characterize and recognize the information about speech. The proposed system is implemented using Mel-Frequency Cepstral Coefficient (MFCC) feature extraction technique and Minimum Distance Classifier, Support Vector Machine (SVM) methods for speech classification. Speech utterances are pre-recorded and stored in a database. Database mainly divided into two parts testing and training. Samples from training database are passed through training phase and features are extracted. Combining features for each sample forms feature vector which is stored as reference. Sample to be tested from testing part is given to system and its features are extracted. Similarity between these features and reference feature vector is computed and words having maximum similarity are given as output. The system is developed in MATLAB (R2010a) environment.

Priyanka P. Patil and Sanjay A. Pardeshi, “Marathi Connected Word Speech Recognition System”, IEEE First InternationalConference on Networks & Soft Computing, pp. 314-318, Aug. 2014.
M.A. Anusuya and S.K. Katti, “Speech Recognition by Machine: A Review”, International Journal of Computer Scienceand Information Security, vol. 6, no. 3, 2009.
Mathias De Wachter, Mike Matton, Kris Demuynck and Patrick Wambacq, “Template Based Continuous Speech Recognition”, IEEE Transs. On Audio Speech & Language Processing, vol. 15, no. 4, pp. 1377-1390, May 2007.
C.M. Vikram and K. Umarani, “Phoneme Independent Pathalogical Voice Detection Using Wavelet Based MFCCs GMM-SVM Hybrid Classifier”, IEEE InternationalConference on Advances in Computing Communications and Informatics, pp. 929-934, Aug. 2013.
V. Naresh, B. Venkataramani, Abhishek Karan and J. Manikandan, “PSOC based isolated speech recognition system”, IEEE InternationalConference on Communication and Signal Processing, pp. 693-697, 3–5, April 2013.

https://ieeexplore.ieee.org/document/7754130/references#references