Indian Statistical Institute

Doctoral Thesis

Doctor of Philosophy

Computer Science


Mahalanobis, Prasanta Chandra (RTS-Kolkata; ISI)

The investigations reported in the present the is were started towards the end of 1957 and carried along internittently amidst teaching and professnional work on ample surveys and econometrics, Curiosity provided the major inpulse. The authors ambition was, initially, to throw up sone numerical tubles for Benguli, Sanskrit, Prakrit and Pali, such usüre found for many western languages in Herdans (1956) Language as Choice and Chance. Grudually, as the work progressed, the view changed and more analytical studies suggested themselves. Some of the se latter have been completed in arrovidional manner and re ported in the present thesis.Chapter 1 attempts a broad survey of previous re searches classified under three heads : (1) statistical studies on literary style, baged on word length, sentence-length, size and composition of vocubulary etc. (ii) studies on statistical properties of lunguages, e., the relative frequencies of letters or rhonenes, and the Zipf law of word-frequencies, and (iii) information theoretic analyses of languages carried out by Shannon and others. One section is devoted to Indian work in various lines.Chapters 2 to 7 report on studies relating to word-length, almost entirely confined to Bengali, with em physic on prose fiction. Word-length has been measured in syllables. Chapters 8 and 9 describe the corresponding investigations on sentence-length, measuring sentence length in terms of the number of words; but the scale of investigation is more modest here and Loe try has been excluded for obvious reasons.Chạpter 2 gives an account of the samples of Benguli words analysed in the different studies. Probability samples of words were selected from many prose works, but as the study progressed, it become apparent that non-probabilistic nys tenatic samples could be conveniently used as approximations to probability samples.Chapter 2 describes these mthods of sunpling, es tublishes the "validity" of the systematic samples and examines the sompling properties of the estimates thrown up by the two types of samples. Probability sumpling has seldom been used for statistical studies on lunguages; and the uses of non-probabilistic samples have hardly been rigorously justified. Yet statistical me thods valid for strictly random sumples have been used in a few cuses, without due re serve. A najor objec- tive of the pre sent study is to une probubility sampling for insumples have been used in a few cuses, without due re serve. A najor objec- tive of the pre sent study is to une probubility sampling for investigu- tions on word-leng th, sentence-leng th e tc. and also to justify the non- probabilistic systenatio sümples as approximations to probability sum- ples. The question of sum pling error has been constantly ke pt in view. Standard errors were usully not calcula ted in the de tuiled way. The technique of independent and interpene tra ting ne tworks of subouimples (IPNS) introduced by Mahalanobis (1946) was found to be extrenely Berviceable.


Mathematics Commons