Deep learning for spoken language identification: Can we visualize speech signal patterns?
Neural Computing and Applications
Western countries entertain speech recognition-based applications. It does not happen in a similar magnitude in East Asia. Language complexity could potentially be one of the primary reasons behind this lag. Besides, multilingual countries like India need to be considered so that language identification (words and phrases) can be possible through speech signals. Unlike the previous works, in this paper, we propose to use speech signal patterns for spoken language identification, where image-based features are used. The concept is primarily inspired from the fact that speech signal can be read/visualized. In our experiment, we use spectrograms (for image data) and deep learning for spoken language classification. Using the IIIT-H Indic speech database for Indic languages, we achieve the highest accuracy of 99.96%, which outperforms the state-of-the-art reported results. Furthermore, for a relative decrease of 4018.60% in the signal-to-noise ratio, a decrease of only 0.50% in accuracy tells us the fact that our concept is fairly robust.
Mukherjee, Himadri; Ghosh, Subhankar; Sen, Shibaprasad; Sk Md, Obaidullah; Santosh, K. C.; Phadikar, Santanu; and Roy, Kaushik, "Deep learning for spoken language identification: Can we visualize speech signal patterns?" (2019). Journal Articles. 591.