A Realization Strategy for a Statistical Data Model with Hierarchy of Statistical Objects.

Date of Submission

December 1996

Date of Award

Winter 12-12-1997

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer and Statistical Services Centre (CSSC)


Bagchi, Aditya (CSSC-Kolkata; ISI)

Abstract (Summary of the Work)

Commercially successful database packages are usuallly based on Codds re-lational data model(1]. The conceptual model of the Codds system uses two dimensional table. It provides operators which can logically relate two tables or take a subeet of an existing table. Facilities are also available to create new tables and to alter existing tables. Relational model, however provides limited facilities for numerical and statistical computation. The data produced by such operations are also not stored in the database.However there are certain application areas like the census of a country, large socio-economic surveys etc. where extensive statistical analysis is done on the data. The users are also not interested in the raw data, rather queries are made on the summary data only. It is not very advantageous to cre- ate such an environment under relational framework. A new data model is needed.1.2 Statistical DatabaseStatistical database handles applications where the users prefer to do sta- tistical computation over the stored raw data to generate summary data. Storage, updation, retrieval etc. are done on the summary data instead of the raw data. For example, in case of a country wide census, the users are mainly interested in the summary data like average population per state, per capita income etc. and not about data related to individual citizen. For this purpose statistical functions like Total, Count, Mean, Standard Deviation etc. are applied on the stored raw data (also called micro data) to generate the required summary data (also called macro data or statistical object).1.3 Salient Features of SDB1.3.1 Multi-dimensionalityWhile statistical computations are done over one or more variates (called attributes in relational system), other features or attributes (caled category attributes) associated with the variates define the universe of such computa- tion. The result of the computations creates a new attribute called summary attribute. For example a country wide census collects data about each cit- izen of the country. Now if we want to know the total number of citizen against each unique combination of values of religion, state and profession recorded during the census, then the new attribute population generated by the computations is the summary attribute and religion, state and profes- sion are the category attributes. These category attributes associated with a summary attribute thus define a multi-dimensional space which cannot be represented effectively in the two dimensional structure of the relational tables.1.3.2 Classification HierarchySimilar to generalization/specialization hierarchy present among the at- tributes and relations, is.a relationship may also be present among cate- gory attributes. The extensions of relational model proposed so far for mod- elling SDB could not take care of this property. However Graph Oriented Model of SDB can effectively handle multi-dimensionality and classification hierarchy.1.3.3 Intermediate or Meta-DataIn the process of computing a summary attribute, one may come across cer- tain intermediate functions which cause additional summary attributes to be created. These intermediate functions may be stored for ease of future computations.For example, SUM(X), SUM-OF-SQUARE(X) and VAR(X) are created and stored when VAR(X) (variance of X) is computed on variate X. Later they may be used for future computations of other summary attributes. The set of category attributes for such computations should, however, remain un-altered.


ProQuest Collection ID: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:28843354

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.



This document is currently not available here.