Approximate Schema Design by Conceptual Clustering.

Date of Submission

December 1996

Date of Award

Winter 12-12-1997

Institute Name (Publisher)

Indian Statistical Institute

Document Type

Master's Dissertation

Degree Name

Master of Technology

Subject Name

Computer Science


Computer and Statistical Services Centre (CSSC)


Bagchi, Aditya (CSSC-Kolkata; ISI)

Abstract (Summary of the Work)

Class and class hierarchy play a very important role in the database schema design. As a modeling tool class provides a generic concept and instances of such concept are the members of the class. Inter-clasm relationship is represented by the class hierarchy. Usually a data model does not provide the flexibility of inter-node relationshipe present in a knowledge representation tool like semantie network. It only takes care of IS-A relationahip by inclusion dependencies and PART-OF relationship by complex objects. In most of the application areas the intention of the internal description of a class and the class hierarchy for the domain are fixed beforehand. The class of an instance is specified during its insertion to the system and it also strictly conforms to the internal description. However, in a real life situation such rigid elass description and hierarchy may not always be able to accept all instances. For example, the class description of Person may have Address as an attribute. Now, a Vagabond does not have an Address, nevertheless he is a Person. There are several ways to tackle this situation:1. Mark all the instances of Vagabond as exceptions to Person.2. Remove Address from the description of person so that it does not remain an ensential feature for the concept of person any more.3. Break the class Person into two subelassen; one with the Address where the existing instancen will be placed and the other without the attribute Address where new instances of Vagabond will be placed.The atandard data model do not provide any of the above features. However, in order to handle a real life application properly, a data model should have the facilitien to: 1. Flag exceptions to the class.2. Change the internal description of the class.3. Create new classes and reorganize the class hierarchy.So it is evident that internal class dencription would contain less and less information as the number of instances increases. A class description should contain only those ennential attributes which are common among all the instances. Such identifying feature set may again vary with the appearance of exceptions and the class hierarchy may have to be reorganized.In a real life situation, therefore,class description will always be approximate and the class hierarchy may have to be dynamie. In some application areas like Anthropology, Archaelogy etc., the classification of instances and class hierarchy are not very well-defined. New instances are often found for which the class descrip- tiona and hierarchy need to be changed. New classes may have to be created. The system may start with an incomplete clam hierarchy and initial class descriptions and may keep on modifying them as new instances are inserted. Thus the schema design process would behave like a learning system. One such conceptual clustering algorithm for approximate schema design has already been proposed by Beck et. al.(7). Their proposal is based on Erplanation Based Learning and efforta have been made to update class hierarchy after every instance received as ezception. In an application domain where the number of attributes as well the number of instances are sufficiently large, such an approach may not be very helpful.This dissertation starta with an incomplete class description. A set of unambiguous learning ex- amples are taken where targeted classes are known. These examples are used to learn the identifying attributes for each class.


ProQuest Collection ID:

Control Number


Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


This document is currently not available here.