Deep learning for spoken language identification: Can we visualize speech signal patterns?
Article Type
Research Article
Publication Title
Neural Computing and Applications
Abstract
Western countries entertain speech recognition-based applications. It does not happen in a similar magnitude in East Asia. Language complexity could potentially be one of the primary reasons behind this lag. Besides, multilingual countries like India need to be considered so that language identification (words and phrases) can be possible through speech signals. Unlike the previous works, in this paper, we propose to use speech signal patterns for spoken language identification, where image-based features are used. The concept is primarily inspired from the fact that speech signal can be read/visualized. In our experiment, we use spectrograms (for image data) and deep learning for spoken language classification. Using the IIIT-H Indic speech database for Indic languages, we achieve the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results. Furthermore, for a relative decrease of 4018.60% in the signal-to-noise ratio, a decrease of only 0.50% in accuracy tells us the fact that our concept is fairly robust.
First Page
8483
Last Page
8501
DOI
10.1007/s00521-019-04468-3
Publication Date
12-1-2019
Recommended Citation
Mukherjee, Himadri; Ghosh, Subhankar; Sen, Shibaprasad; Sk Md, Obaidullah; Santosh, K. C.; Phadikar, Santanu; and Roy, Kaushik, "Deep learning for spoken language identification: Can we visualize speech signal patterns?" (2019). Journal Articles. 591.
https://digitalcommons.isical.ac.in/journal-articles/591