Date of Submission
2-22-2006
Date of Award
2-22-2007
Institute Name (Publisher)
Indian Statistical Institute
Document Type
Doctoral Thesis
Degree Name
Doctor of Philosophy
Subject Name
Computer Science
Department
Machine Intelligence Unit (MIU-Kolkata)
Supervisor
Dutta, Asoke Kumar|Murthy, C. A.
Abstract (Summary of the Work)
The primary communication process between human beings is Speech. Speech synthesis is the automatic and artificial generation of the speech signal by a machine. A TTS (Text-To-Speech) synthesis system is one which can generate speech signal from a string of text in a given language. The development in the speech synthesis systems in various languages has been going on for several decades. With the unprecedented expansion of IT (Information Technology) invading the life of the common man it is highly desirable that at least the information dissemination be made via the speech mode which is the most natural mode of human communication. A speech synthesizer should be able to synthesize any arbitrary word sequence with proper intelligibility and naturalness. While recent developments in synthesizer technology meet to some degree intelligibility and naturalness for some applications, a lot needs to be done for increasing the sound quality and naturalness in unlimited speech. There is an urgent need for TTS systems for all major Indian languages.The TTS (Text-To-Speech) systems should have the capability of synthesizing an unlimited number of sentences from unrestricted text input [5, 6, 83, 149]. The simplest way of storing each spoken word in a particular language, like a normal dictionary for written language, is not adequate simply because in continuous speech, the adjacent words blend together due to co-articulation effects and because of the vital role suprasegmentals play in spoken languages regarding the semantic, emphatic, mood and emotional contents. These effects contain significant intelligence load of a spoken communication. The method of word or sentence concatenation is of course used in some task specific IVRS (Interactive Voice Response Systems). With due regards to the usefulness of these in specific cases, these are never seriously considered as speech synthesis systems.Figure 1.1 schematically presents a TTS system. A TTS system can broadly be divided into two units, namely the high level unit and the low level unit. The high level unit basically is a text analyzer. In this unit, the input text string is converted into a phonetic and linguistic representation. The second unit is the actual speech synthesizer, which generates speech. As will be discussed later in detail, all synthesizers are broadly classified into two groups, one in which speech wave is generated directly from some chosen physical properties and the other, which uses segments of speech waves instead of basic physical properties for generation of continuous speech. The first type is usually referred to as a parametric synthesizer and the second as a concatenative synthesizer.1.1 History and Development of Speech Synthesis To understand the working process of the present synthesis system and to know how they have been developed to their present form, a historical review may be useful. In this chapter, a brief history of man’s endeavor to synthesize speech from the early mechanical efforts to systems that form the basis for today’s high-quality synthesizers is presented.1.1.1 From Mechanical to Electrical The first effort to produce artificial speech may be traced back to more than two hundred years ago [99, 100, 232]. In 1779, Russian Professor Christian Kratzenstein, in St. Petersburg, described the production mechanism of five long vowels (/a/, /e/, /i/, /o/, and /u/) and developed some apparatus, which can produce them artificially. The apparatus are constructed with certain acoustic resonators similar to the human vocal tract, different for different vowel sounds. The resonators are activated with vibrating reeds like in music instruments. Figure 1.2 shows the basic structure of those resonators. Blowing into the lower pipe without a reed produces the sound /i/ like a flute-like sound.
Control Number
ISILib-TH352
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
DOI
http://dspace.isical.ac.in:8080/jspui/handle/10263/2146
Recommended Citation
Chowdhury, Soumen Dr., "Concatenative Text-To-Speech Synthesis: a Study on Standard Colloquial Bengali." (2007). Doctoral Theses. 306.
https://digitalcommons.isical.ac.in/doctoral-theses/306
Comments
ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843363