Database 8
Specialization and Generalization Hierarchies and Lattices
A subclass itselfmay have further subclasses specified on it,forming a hierarchy or a lattice of specializations. For example, in Figure 8.6 ENGINEER is a subclass of EMPLOYEE and is also a superclass of ENGINEERING_MANAGER; this represents the real-world constraint that every engineering manager is required to be an engi- neer.A specialization hierarchy has the constraint that every subclass participates as a subclass in only one class/subclass relationship; that is, each subclass has parent,which results in a tree structure or strict hierarchy. In contrast, for a specialization lattice,a subclass can be a subclass in more than one class/subclass relationship. Hence, Figure 8.6 is a lattice.
Knowledge representation models allow multiple classification schemes in which one class is an instance ofanother class (called a meta-class).Notice that this cannot be represented directly in the EER model,because we have only two levels—classes and instances.The only relationship among classes in the EER model is a super- class/subclass relationship, whereas in some KR schemes an additional class/instance relationship can be represented directly in a class hierarchy. An instance may itself be another class, allowing multiple-level classification schemes.
2 Identification Identification is the abstraction process whereby classes and objects are made uniquely identifiable by means ofsome identifier. For example, a class name uniquely identifies a whole class within a schema.An additional mechanism is nec- essary for telling distinct object instances apart by means of object identifiers. Moreover, it is necessary to identify multiple manifestations in the database of the same real-world object. For example, we may have a tuple <'Matthew Clarke', '610618','376-9821'> in a PERSON relation and another tuple <'301-54-0836','CS', 3.8> in a STUDENT relation that happen to represent the same real-world entity. There is no way to identify the fact that these two database objects (tuples) represent the same real-world entity unless we make a provision at design time for appropriate cross-referencing to supply this identification. Hence, identification is needed at two levels: ■ To distinguish among database objects and classes ■ To identify database objects and to relate them to their real-world counter- parts
In the EER model,identification ofschema constructs is based on a system of unique names for the constructs in a schema.For example,every class in an EER schema—whether it is an entity type, a subclass, a category, or a relationship type— must have a distinct name.The names ofattributes ofa particular class must also be distinct.Rules for unambiguously identifying attribute name references in a special- ization or generalization lattice or hierarchy are needed as well. At the object level,the values ofkey attributes are used to distinguish among entities ofa particular entity type.For weak entity types,entities are identified by a combi- nation oftheir own partial key values and the entities they are related to in the owner entity type(s). Relationship instances are identified by some combination of the entities that they relate to,depending on the cardinality ratio specified.
3 Specialization and Generalization Specialization is the process ofclassifying a class ofobjects into more specialized subclasses. Generalization is the inverse process of generalizing several classes into a higher-level abstract class that includes the objects in all these classes. Specialization is conceptual refinement, whereas generalization is conceptual syn- thesis.Subclasses are used in the EER model to represent specialization and general- ization.We call the relationship between a subclass and its superclass an IS-A-SUBCLASS-OF relationship, or simply an IS-A relationship. This is the same as the IS-A relationship discussed earlier in Section 8.5.3.
this section we discuss in general terms some ofthe modeling concepts that we described quite specifically in our presentation ofthe ER and EER models in Chapter 7and earlier in this chapter.This terminology is not only used in concep- tual data modeling but also in artificial intelligence literature when discussing knowledge representation (KR). This section discusses the similarities and differ- ences between conceptual modeling and knowledge representation, and introduces some of the alternative terminology and a few additional concepts. The goal ofKR techniques is to develop concepts for accurately modeling some domain of knowledge by creating an ontology12 that describes the concepts of the domain and how these concepts are interrelated.Such an ontology is used to store and manipulate knowledge for drawing inferences, making decisions, or answering questions.The goals ofKR are similar to those ofsemantic data models,but there are some important similarities and differences between the two disciplines:
Both disciplines use an abstraction process to identify common properties and important aspects ofobjects in the miniworld (also known as domain of discourse in KR) while suppressing insignificant differences and unimpor- tant details. ■ Both disciplines provide concepts,relationships, constraints,operations, and languages for defining data and representing knowledge. ■ KR is generally broader in scope than semantic data models.Different forms of knowledge, such as rules (used in inference, deduction, and search), incomplete and default knowledge, and temporal and spatial knowledge, are represented in KR schemes.Database models are being expanded to include some ofthese concepts (see Chapter 26). ■ KR schemes include reasoning mechanisms that deduce additional facts from the facts stored in a database.Hence,whereas most current database systems are limited to answering direct queries, knowledge-based systems using KR schemes can answer queries that involve inferencesover the stored data. Database technology is being extended with inference mechanisms (see Section 26.5). ■ Whereas most data models concentrate on the representation of database schemas, or meta-knowledge, KR schemes often mix up the schemas with the instances themselves in order to provide flexibility in representing excep- tions.This often results in inefficiencies when these KR schemes are imple- mented,especially when compared with databases and when a large amount ofdata (facts) needs to be stored.
Specialization is the process ofdefining a set of subclasses of an entity type; this entity type is called the superclass of the specialization. The set of subclasses that forms a specialization is defined on the basis of some distinguishing characteristic of the entities in the superclass. For example, the set of subclasses {SECRETARY, ENGINEER, TECHNICIAN} is a specialization ofthe superclass EMPLOYEE that dis- tinguishes among employee entities based on the job type of each employee entity. We may have several specializations ofthe same entity type based on different dis- tinguishing characteristics. For example, another specialization of the EMPLOYEE entity type may yield the set ofsubclasses {SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}; this specialization distinguishes among employees based on the method of pay.
Figure 8.1shows how we represent a specialization diagrammatically in an EER dia- gram.The subclasses that define a specialization are attached by lines to a circle that represents the specialization, which is connected in turn to the superclass. The subset symbol on each line connecting a subclass to the circle indicates the direction of the superclass/subclass relationship.5 Attributes that apply only to entities ofa particular subclass—such as TypingSpeed of SECRETARY—are attached to the rec- tangle representingattributes) of the subclass. Similarly, a subclass can participate in specific relation- ship types,such as the HOURLY_EMPLOYEE subclass participating in the BELONGS_TO relationship in Figure 8.1.We will explain the d symbol in the circles in Figure 8.1 and additional EER diagram notation shortlythat subclass. These are called specific attributes (or local
some specializations we can determine exactly the entities that will become members ofeach subclass by placing a condition on the value ofsome attribute of the superclass. Such subclasses are called predicate-defined (or condition-defined) subclasses. For example, if the EMPLOYEE entity type has an attribute Job_type, as shown in Figure 8.4,we can specify the condition ofmembership in the SECRETARY subclass by the condition (Job_type = 'Secretary'), which we call the defining predicate ofthe subclass.This condition is a constraint specifying that exactly those entities ofthe EMPLOYEE entity type whose attribute value for Job_type is'Secretary'belong to the subclass.We display a predicate-defined subclass by writing the predicate condition next to the line that connects the subclass to the specialization circle.
If all subclasses in a specialization have their membership condition on the same attribute of the superclass,the specialization itself is called an attribute-defined spe- cialization,and the attribute is called the defining attribute of the specialization.6In this case,all the entities with the same value for the attribute belong to the same sub- class.We display an attribute-defined specialization by placing the defining attribute name next to the arc from the circle to the superclass,as shown in Figure 8.4.
The ER modeling concepts discussed in Chapter 7 are sufficient for representing many database schemas for traditional database applications, which include many data-processing applications in business and industry. Since the late 1970s, however, designers of database applications have tried to design more accurate database schemas that reflect the data properties and constraints more precisely. This was particularly important for newer applications of database technology, such as databases for engineering design and manufacturing (CAD/CAM),1 telecommunications, com- plex software systems, and Geographic Information Systems (GIS), among many other applications. These types of databases have more complex requirements than do the more traditional applications. This led to the development of additional semantic data modeling concepts that were incorporated into conceptual data mod- els such as the ER model.Various semantic data models have been proposed in the literature. Many of these concepts were also developed independently in related areas ofcomputer science,such as the knowledge representation area of artificial intelligence and the object modeling area in software engineering.
In this chapter,we describe features that have been proposed for semantic data mod- els,and show how the ER model can be enhanced to include these concepts,leading to the Enhanced ER (EER) model.2We start in Section 8.1by incorporating the con- cepts of class/subclass relationships and type inheritance into the ER model. Then, in Section 8.2,we add the concepts of specialization and generalization. Section 8.
The second constraint on specialization is called the completeness (or totalness) constraint,which may be total or partial.A total specialization constraint specifies that every entity in the superclass must be a member ofat least one subclass in the specialization. For example, if every EMPLOYEE must be either an HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 8.1is a total specialization of EMPLOYEE.This is shown in EER diagrams by using a double line to connect the superclass to the circle.A single line is used to display a partial specialization, which allows an entity not to belong to any ofthe subclasses.For example,ifsome EMPLOYEE entities do not belong to any ofthe subclasses {SECRETARY, ENGINEER, TECHNICIAN} in Figures 8.1and 8.4,then that specialization is partial.
Notice that the disjointness and completeness constraints are independent. Hence, we have the following four possible constraints on specialization: ■ Disjoint, total ■ Disjoint, partial ■ Overlapping, total ■ Overlapping, course, the correct constraint is determined from the real-world meaning that applies to each specialization. In general, a superclass that was identified through the generalization process usually is total,because the superclass is derived from the subclasses and hence contains only the entities that are in the subclasses. Certain insertion and deletion rules apply to specialization (and generalization) as a consequence of the constraints specified earlier. Some of these rules are as follows: ■ Deleting an entity from a superclass implies that it is automatically deleted from all the subclasses to which it belongs. ■ Inserting an entity in a superclass implies that the entity is mandatorily inserted in all predicate-defined (or attribute-defined) subclasses for which the entity satisfies the defining predicate. ■ Inserting an entity in a superclass ofa total specialization implies that the entity is mandatorily inserted in at least one ofthe subclasses ofthe special- ization. The reader is encouraged to make a complete list ofrules for insertions and dele- tions for the various types of specializations.
8.2.2 Generalization We can think ofa reverse process of abstraction in which we suppress the differences among several entity types, identify their common features, and generalize them into a single superclassofwhich the original entity types are special subclasses.For example, consider the entity types CAR and TRUCK shown in Figure 8.3(a). Because they have several common attributes,they can be generalized into the entity type VEHICLE,as shown in Figure 8.3(b).Both CARand TRUCK are now subclasses of the generalized superclass VEHICLE.We use the term generalization to r
Notice that the generalization process can be viewed as being functionally the inverse of the specialization process. Hence, in Figure 8.3 we can view {CAR, TRUCK} as a specialization of VEHICLE,rather than viewing VEHICLE as a generalization of CAR and TRUCK.Similarly,in Figure 8.1we can view EMPLOYEE as a generalization of SECRETARY, TECHNICIAN,and ENGINEER. A diagrammatic notation to distinguish between generalization and specialization is used in some design methodologies. An arrow pointing to the generalized superclass represents a generalization, whereas arrows pointing to the specialized subclasses represent a specialization.We will not use this notation because the decision as to which process is followed in a particular situation is often subjective. Appendix A gives some of the suggested alternative dia- grammatic notations for schema diagrams and class diagrams.
Utilizing Specialization and Generalization in Refining Conceptual Schemas
Now we elaborate on the differences between the specialization and generalization processes,and how they are used to refine conceptual schemas during conceptual database design. In the specialization process, we typically start with an entity type and then define subclasses of the entity type by successive specialization; that is, we repeatedly define more specific groupings of the entity type. For example, when designing the specialization lattice in Figure 8.7,we may first specify an entity type PERSONfor a university database.Then we discover that three types ofpersons will be represented in the database: university employees, alumni, and students.We cre- ate the specialization {EMPLOYEE, ALUMNUS, STUDENT} for this purpose and choose the overlapping constraint,because a person may belong to more than one of the subclasses. We specialize EMPLOYEE further into {STAFF, FACULTY, STUDENT_ASSISTANT}, and specialize STUDENT into {GRADUATE_STUDENT, UNDERGRADUATE_STUDENT}. Finally, we specialize STUDENT_ASSISTANT into {RESEARCH_ASSISTANT, TEACHING_ASSISTANT}. This successive specialization corresponds to a top-down conceptual refinement process during conceptual schema design.So far,we have a hierarchy;then we realize that STUDENT_ASSISTANT is a shared subclass,since it is also a subclass of STUDENT, leading to the lattice.
order to understand the different uses ofaggregation better,consider the ER schema shown in Figure 8.11(a),which stores information about interviews by job applicants to various companies. The class COMPANY is an aggregation ofthe attributes (or component objects) Cname (company name) and Caddress (company address), whereas JOB_APPLICANTis an aggregate of Ssn,Name,Address,and Phone. The relationship attributes Contact_name and Contact_phone represent the name and phone number ofthe person in the company who is responsible for the inter- view.Suppose that some interviews result in job offers,whereas others do not.We would like to treat INTERVIEW as a class to associate it with JOB_OFFER. The schema shown in Figure 8.11(b)is incorrect because it requires each interview rela- tionship instance to have a job offer.The schema shown in Figure 8.11(c)is not allowed because the ER model does not allow relationships among relationships.
One way to represent this situation is to create a higher-level aggregate class com- posed of COMPANY, JOB_APPLICANT,and INTERVIEW and to relate this class to JOB_OFFER,as shown in Figure 8.11(d).Although the EER model as described in this book does not have this facility,some semantic data models do allow it and call the resulting object a composite or molecular object. Other models treat entity types and relationship types uniformly and hence permit relationships among rela- tionships, as illustrated in Figure 8.11(c). To represent this situation correctly in the ER model as described here,we need to create a new weak entity type INTERVIEW,as shown in Figure 8.11(e),and relate it to JOB_OFFER.Hence,we can always represent these situations correctly in the ER model by creating additional entity types,although it may be conceptually more desirable to allow direct representation of aggregation, as in Figure 8.11(d), or to allow relationships among relationships, as in Figure 8.11(c).
Formal Definitions for the EER Model Concepts We now summarize the EER model concepts and give formal definitions.A class11 is a set or collection ofentities;this includes any ofthe EER schema constructs of group entities, such as entity types, subclasses, superclasses, and categories. A subclass S is a class whose entities must always be a subset ofthe entities in another class, called the superclass C ofthe superclass/subclass (or IS-A) relationship.We denote such a relationship by C/S. For such a superclass/subclass relationship, we must always have S ⊆ C A specialization Z = {S1, S2, ..., Sn} is a set ofsubclasses that have the same super- class G; that is, G/Si is a superclass/subclass relationship for i = 1, 2, ..., n. G is called a generalized entity type (or the superclass of the specialization, or a generalization of the subclasses {S1, S2,..., Sn} ). Zis said to be total ifwe always (at any point in time) have n∪Si = G i=1 Otherwise, Z is said to be partial. Z is said to be disjoint if we always have Si ∩ Sj = ∅ (empty set) for i ≠ j
Otherwise, Z is said to be overlapping. A subclass S of C is said to be predicate-defined ifa predicate p on the attributes of C is used to specify which entities in C are members of S; that is, S = C[p], where C[p] is the set ofentities in C that satisfy p.A subclass that is not defined by a pred- icate is called user-defined. A specialization Z (or generalization G) is said to be attribute-defined if a predicate (A = ci), where Ais an attribute of Gand ciis a constant value from the domain of is used to specify membership in each subclass Si in Z.Notice that if ci ≠ cj for i ≠ j, and A is a single-valued attribute, then the specialization will be disjoint. A category T is a class that is a subset ofthe union of n defining superclasses D1,D2, ..., Dn, n > 1,and is formally specified as follows: T ⊆ (D1 ∪ D2 ... ∪ Dn) A predicate pi on the attributes of Di can be used to specify the members ofeach Di that are members of T.Ifa predicate is specified on every Di,we get T = (D1[p1] ∪ D2[p2] ... ∪ Dn[pn]) We should now extend the definition of relationship type given in Chapter 7by allowing any class—not only any entity type—to participate in a relationship. Hence,we should replace the words entity type with class in that definition. The graphical notation ofEER is consistent with ER because all classes are represented by rectangles,
The EER model includes all the modeling concepts ofthe ER model that were pre- sented in Chapter 7.In addition,it includes the concepts of subclass and superclass and the related concepts of specialization and generalization (see Sections 8.2and 8.3).Another concept included in the EER model is that ofa category or union type (see Section 8.4),which is used to represent a collection of objects (entities) that is the union of objects of different entity types. Associated with these concepts is the important mechanism of attribute and relationship inheritance. Unfortunately, no standard terminology exists for these concepts, so we use the most common terminology.Alternative terminology is given in foot- notes. We also describe a diagrammatic technique for displaying these concepts when they arise in an EER schema.We call the resulting schema diagrams enhanced ER or EER diagrams.
The EER model includes all the modeling concepts ofthe ER model that were pre- sented in Chapter 7.In addition,it includes the concepts of subclass and superclass and the related concepts of specialization and generalization (see Sections 8.2and 8.3).Another concept included in the EER model is that ofa category or union type (see Section 8.4),which is used to represent a collection of objects (entities) that is the union of objects of different entity types. Associated with these concepts is the important mechanism of attribute and relationship inheritance. Unfortunately, no standard terminology exists for these concepts, so we use the most common terminology.Alternative terminology is given in foot- notes. We also describe a diagrammatic technique for displaying these concepts when they arise in an EER schema.We call the resulting schema diagrams enhanced ER or EER diagrams.
recent years,the amount ofcomputerized data and information available on the Web has spiraled out ofcontrol.Many different models and formats are used.In addition to the database models that we present in this book,much information is stored in the form of documents, which have considerably less structure than data- base information does.One ongoing project that is attempting to allow information exchange among computers on the Web is called the SemanticWeb, which attempts to create knowledge representation models that are quite general in order to allow meaningful information exchange and search among machines. The concept of ontology is considered to be the most promising basis for achieving the goals ofthe Semantic Web and is closely related to knowledge representation. In this section, we give a briefintroduction to what ontology is and how it can be used as a basis to automate information understanding, search, and exchange.
The study of ontologies attempts to describe the structures and relationships that are possible in reality through some common vocabulary;therefore,it can be con- sidered as a way to describe the knowledge ofa certain community about reality. Ontology originated in the fields of philosophy and metaphysics. One commonly used definition of ontology is a specification ofa conceptualization.1
Figure 8.2shows a few entity instances that belong to subclasses ofthe {SECRETARY, ENGINEER, TECHNICIAN} specialization. Again, notice that an entity that belongs to a subclass represents the same real-world entity as the entity con- nected to it in the EMPLOYEE superclass,even though the same entity is shown twice; for example,e1is shown in both EMPLOYEEand SECRETARY in Figure 8.2.As the figure suggests, a superclass/subclass relationship such as EMPLOYEE/ SECRETARY somewhat resembles a 1:1 relationship at the instance level (see Figure 7.12).The main difference is that in a 1:1 relationship two distinct entities are related,whereas in a superclass/subclass relationship the entity in the subclass is the same real-world entity as the entity in the superclass but is playing a specialized role—for example,an EMPLOYEE specialized in the role of SECRETARY,or an EMPLOYEE specialized in the role of TECHNICIAN.
There are two main reasons for including class/subclass relationships and specializa- tions in a data model.The first is that certain attributes may apply to some but not all entities ofthe superclass.A subclass is defined in order to group the entities to which these attributes apply.The members ofthe subclass may still share the majority of their attributes with the other members of the superclass. For example, in Figure 8.1 the SECRETARY subclass has the specific attribute Typing_speed, whereas the ENGINEER subclass has the specific attribute Eng_type,but SECRETARY and ENGINEERshare their other inherited attributes from the EMPLOYEE entity type. The second reason for using subclasses is that some relationship types may be par- ticipated in only by entities that are members of the subclass. For example, if only HOURLY_EMPLOYEES can belong to a trade union,we can represent that fact by creating the subclass HOURLY_EMPLOYEE of EMPLOYEE and relating the subclass to an entity type TRADE_UNION via the BELONGS_TO relationship type, as illus- trated in Figure 8.1. In summary,the specialization process allows us to do the following: ■ Define a set ofsubclasses ofan entity type ■ Establish additional specific attributes with each subclass ■ Establish additional specific relationsh
When we do not have a condition for determining membership in a subclass,the subclass is called user-defined.Membership in such a subclass is determined by the database users when they apply the operation to add an entity to the subclass;hence, membership is specified individually for each entity by the user,not by any condition that may be evaluated automatically.
Two other constraints may apply to a specialization.The first is the disjointness (or disjointedness) constraint, which specifies that the subclasses of the specialization must be disjoint.This means that an entity can be a member of at most one of the subclasses of the specialization. A specialization that is attribute-defined implies the disjointness constraint (if the attribute used to define the membership predicate is single-valued). Figure 8.4 illustrates this case, where the d in the circle stands for disjoint.The d notation also applies to user-defined subclasses of a specialization that must be disjoint,as illustrated by the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 8.1.Ifthe subclasses are not constrained to be dis- joint,their sets ofentities may be overlapping; that is, the same (real-world) entity may be a member ofmore than one subclass ofthe specialization.This case,which is the default,is displayed by placing an o in the circle,as shown in Figure 8.5.
It is possible to arrive at the same hierarchy or lattice from the other direction.In such a case, the process involves generalization rather than specialization and corre- sponds to a bottom-up conceptual synthesis. For example, the database designers may first discover entity types such as STAFF, FACULTY, ALUMNUS, GRADUATE_STUDENT, UNDERGRADUATE_STUDENT, RESEARCH_ASSISTANT, TEACHING_ASSISTANT,and so on;then they generalize {GRADUATE_STUDENT,
UNDERGRADUATE_STUDENT} into STUDENT; then they generalize {RESEARCH_ASSISTANT, TEACHING_ASSISTANT} into STUDENT_ASSISTANT;then they generalize {STAFF, FACULTY, STUDENT_ASSISTANT} into EMPLOYEE; and finally they generalize {EMPLOYEE, ALUMNUS, STUDENT} into PERSON. In structural terms,hierarchies or lattices resulting from either process may be iden- tical;the only difference relates to the manner or order in which the schema super- classes and subclasses were created during the design process.In practice,it is likely that neither the generalization process nor the specialization process is followed strictly,but that a combination ofthe two processes is employed.New classes are continually incorporated into a hierarchy or lattice as they become apparent to users and designers.Notice that the notion ofrepresenting data and knowledge by using superclass/subclass hierarchies and lattices is quite common in knowledge-based sys- tems and expert systems, which combine database technology with artificial intelli- gence techniques. For example, frame-based knowledge representation schemes closely resemble class hierarchies. Specialization is also common in software engi- neering design methodologies that are based on the object-oriented paradigm.
category has two or more superclasses that may represent distinct entity types, whereas other superclass/subclass relationships always have a single superclass.To better understand the difference,we can compare a category,such as OWNER in Figure 8.8,with the ENGINEERING_MANAGER shared subclass in Figure 8.6. The latter is a subclass of each of the three superclasses ENGINEER, MANAGER, and SALARIED_EMPLOYEE,so an entity that is a member of ENGINEERING_MANAGER must exist in all three. This represents the constraint that an engineering manager must be an ENGINEER,a MANAGER, and a SALARIED_EMPLOYEE; that is, ENGINEERING_MANAGER is a subset ofthe intersection ofthe three classes (sets of entities).On the other hand,a category is a subset ofthe union of its superclasses. Hence,an entity that is a member of OWNER must exist in only one of the super-
classes.This represents the constraint that an OWNER may be a COMPANY,a BANK, or a PERSON in Figure 8.8. Attribute inheritance works more selectively in the case of categories. For example, in Figure 8.8each OWNERentity inherits the attributes ofa COMPANY,a PERSON, or a BANK,depending on the superclass to which the entity belongs.On the other hand,a shared subclass such as ENGINEERING_MANAGER (Figure 8.6) inherits all the attributes of its superclasses SALARIED_EMPLOYEE, ENGINEER, and MANAGER. It is interesting to note the difference between the category REGISTERED_VEHICLE (Figure 8.8) and the generalized superclass VEHICLE (Figure 8.3(b)). In Figure 8.3(b),every car and every truck is a VEHICLE;but in Figure 8.8,the REGISTERED_VEHICLEcategory includes some cars and some trucks but not neces- sarily all ofthem (for example,some cars or trucks may not be registered).In gen- eral,a specialization or generalization such as that in Figure 8.3(b),ifit were partial, would not preclude VEHICLE from containing other types of entities, such as motorcycles. However, a category such as REGISTERED_VEHICLE in Figure 8.8 implies that only cars and trucks,but not other types ofentities,can be members of REGISTERED_VEHICLE.
The requirements for the part ofthe UNIVERSITY database shown in Figure 8.7are the following: 1. The database keeps track of three types of persons: employees, alumni, and students.A person can belong to one,two,or all three ofthese types.Each person has a name, SSN, sex, address, and birth date. 2. Every employee has a salary, and there are three types of employees: faculty, staff, and student assistants. Each employee belongs to exactly one of these types.For each alumnus,a record ofthe degree or degrees that he or she
earned at the university is kept,including the name ofthe degree,the year granted, and the major department. Each student has a major department. 3. Each faculty has a rank,whereas each staffmember has a staffposition. Student assistants are classified further as either research assistants or teach- ing assistants,and the percent oftime that they work is recorded in the data- base. Research assistants have their research project stored, whereas teaching assistants have the current course they work on. 4. Students are further classified as either graduate or undergraduate, with the specific attributes degree program (M.S., Ph.D., M.B.A., and so on) for graduate students and class (freshman, sophomore, and so on) for under- graduates.
subgroupings is also an employee.We call each ofthese subgroupings a subclass or subtype ofthe EMPLOYEE entity type,and the EMPLOYEE entity type is called the superclassor supertype for each of these subclasses.Figure 8.1 shows how to repre- sent these concepts diagramatically in EER diagrams. (The circle notation in Figure 8.1will be explained in Section 8.2.) We call the relationship between a superclass and any one ofits subclasses a superclass/subclass or supertype/subtype or simply class/subclass relationship.3 In our previous example, EMPLOYEE/SECRETARYand EMPLOYEE/TECHNICIAN are two class/subclass relationships. Notice that a member entity of the subclass repre- sents the same real-world entity as some member of the superclass; for example, a SECRETARY entity 'Joan Logano'is also the EMPLOYEE 'Joan Logano.' Hence, the subclass member is the same as the entity in the superclass,but in a distinct specific role. When we implement a superclass/subclass relationship in the database system, however,we may represent a member ofthe subclass as a distinct database object— say,a distinct record that is related via the key attribute to its superclass entity.In Section 9.2, we discuss various options for representing superclass/subclass rela- tionships in relational databases.
entity cannot exist in the database merely by being a member ofa subclass;it must also be a member ofthe superclass.Such an entity can be included optionally as a member ofany number ofsubclasses.For example,a salaried employee who is also an engineer belongs to the two subclasses ENGINEER and SALARIED_EMPLOYEE ofthe EMPLOYEE entity type. However, it is not necessary that every entity in a superclass is a member ofsome subclass.
All ofthe superclass/subclass relationships we have seen thus far have a single super- class. A shared subclass such as ENGINEERING_MANAGER in the lattice in Figure 8.6is the subclass in three distinct superclass/subclass relationships, where each of the three relationships has a single superclass. However, it is sometimes necessary to represent a single superclass/subclass relationship with more than one superclass, where the superclasses represent different entity types. In this case, the subclass will represent a collection ofobjects that is a subset ofthe UNION of distinct entity types; we call such a subclass a union type or a category.
example,suppose that we have three entity types: PERSON, BANK, and COMPANY.In a database for motor vehicle registration,an owner ofa vehicle can be a person,a bank (holding a lien on a vehicle),or a company.We need to create a class (collection ofentities) that includes entities ofall three types to play the role of vehicle owner.A category (union type) OWNER that is a subclass ofthe UNION of the three entity sets of COMPANY, BANK,and PERSON can be created for this purpose. We display categories in an EER diagram as shown in Figure 8.8.The superclasses COMPANY, BANK,and PERSON are connected to the circle with the ∪ symbol, which stands for the set union operation.An arc with the subset symbol connects the circle to the (subclass) OWNER category. If a defining predicate is needed, it is dis- played next to the line from the superclass to which the predicate applies.In Figure 8.8 we have two categories: OWNER,which is a subclass ofthe union of PERSON, BANK,and COMPANY;and REGISTERED_VEHICLE,which is a subclass ofthe union of CAR and TRUCK.
Classification and Instantiation The process of classification involves systematically assigning similar objects/enti- ties to object classes/entity types.We can now describe (in DB) or reason about (in KR) the classes rather than the individual objects. Collections of objects that share the same types of attributes, relationships, and constraints are classified into classes in order to simplify the process of discovering their properties. Instantiation is the inverse of classification and refers to the generation and specific examination of dis- tinct objects ofa class.An object instance is related to its object class by the IS-AN- INSTANCE-OF or IS-A-MEMBER-OF relationship. Although EER diagrams do not display instances,the UML diagrams allow a form ofinstantiation by permit- ting the display of individual objects.We did not describe this feature in our intro- duction to UML class diagrams.
general, the objects of a class should have a similar type structure. However, some objects may display properties that differ in some respects from the other objects of the class; these exception objects also need to be modeled,and KR schemes allow more varied exceptions than do database models. In addition, certain properties apply to the class as a whole and not to the individual objects;KR schemes allow such class properties. UML diagrams also allow specification of class properties. In the EER model,entities are classified into entity types according to their basic attributes and relationships. Entities are further classified into subclasses and cate- gories based on additional similarities and differences (exceptions) among them. Relationship instances are classified into relationship types. Hence, entity types, subclasses, categories, and relationship types are the different concepts that are used for classification in the EER model.The EER model does not provide explicitly for class properties,but it may be extended to do so.In UML,objects are classified into classes,and it is possible to display both class properties and individual objects.
Constraints on Specialization and Generalization In general,we may have several specializations defined on the same entity type (or superclass),as shown in Figure 8.1.In such a case,entities may belong to subclasses
in each of the specializations. However, a specialization may also consist of a single subclass only,such as the {MANAGER} specialization in Figure 8.1;in such a case,we do not use the circle notation.
8.7.4 Aggregation and Association Aggregation is an abstraction concept for building composite objects from their component objects.There are three cases where this concept can be related to the EER model.The first case is the situation in which we aggregate attribute values of
object to form the whole object.The second case is when we represent an aggre- gation relationship as an ordinary relationship. The third case, which the EER model does not provide for explicitly, involves the possibility of combining objects that are related by a particular relationship instance into a higher-level aggregate object. This is sometimes useful when the higher-level aggregate object is itselfto be related to another object.We call the relationship between the primitive objects and their aggregate object IS-A-PART-OF;the inverse is called IS-A-COMPONENT- OF. UML provides for all three types of aggregation. The abstraction of association is used to associate objects from several independent classes. Hence,it is somewhat similar to the second use ofaggregation.It is repre- sented in the EER model by relationship types,and in UML by associations.This abstract relationship is called IS-ASSOCIATED-WITH.
discusses the various types of constraints on specialization/generalization, and Section 8.4shows how the UNION construct can be modeled by including the con- cept of category in the EER model.Section 8.5gives a sample UNIVERSITY database schema in the EER model and summarizes the EER model concepts by giving formal definitions.We will use the terms object and entity interchangeably in this chapter, because many of these concepts are commonly used in object-oriented models.
present the UML class diagram notation for representing specialization and gen- eralization in Section 8.6,and briefly compare these with EER notation and con- cepts.This serves as an example ofalternative notation,and is a continuation of Section 7.8, which presented basic UML class diagram notation that corresponds to the basic ER model.In Section 8.7,we discuss the fundamental abstractions that are used as the basis of many semantic data models. Section 8.8 summarizes the chapter. For a detailed introduction to conceptual modeling,Chapter 8 should be consid- ered a continuation ofChapter 7.However,ifonly a basic introduction to ER mod- eling is desired, this chapter may be omitted. Alternatively, the reader may choose to skip some or all ofthe later sections ofthis chapter (Sections 8.4through 8.8).
Figure 8.7,all person entities represented in the database are members ofthe PERSON entity type,which is specialized into the subclasses {EMPLOYEE, ALUMNUS, STUDENT}. This specialization is overlapping; for example, an alumnus may also be an employee and may also be a student pursuing an advanced degree. The subclass STUDENT is the superclass for the specialization {GRADUATE_STUDENT, UNDERGRADUATE_STUDENT}, while EMPLOYEE is the superclass for the specialization {STUDENT_ASSISTANT, FACULTY, STAFF}. Notice that STUDENT_ASSISTANT is also a subclass of STUDENT. Finally, STUDENT_ASSISTANT is the superclass for the specialization into {RESEARCH_ASSISTANT, TEACHING_ASSISTANT}. In such a specialization lattice or hierarchy,a subclass inherits the attributes not only of its direct superclass, but also of all its predecessor superclasses all the way to the root of the hierarchy or lattice if necessary. For example, an entity in GRADUATE_STUDENT inherits all the attributes ofthat entity as a STUDENT and as a PERSON.Notice that an entity may exist in several leaf nodes of the hierarchy, where a leaf node is a class that has no subclasses of its own. For example, a member of GRADUATE_STUDENT may also be a member of RESEARCH_ASSISTANT.
subclass with more than one superclass is called a shared subclass, such as ENGINEERING_MANAGER in Figure 8.6.This leads to the concept known as multiple inheritance,where the shared subclass ENGINEERING_MANAGER directly inherits attributes and relationships from multiple classes. Notice that the existence ofat least one shared subclass leads to a lattice (and hence to multiple inheritance); ifno shared subclasses existed,we would have a hierarchy rather than a lattice and only single inheritance would exist. An important rule related to multiple inheri- tance can be illustrated by the example ofthe shared subclass STUDENT_ASSISTANT in Figure 8.7,which inherits attributes from both EMPLOYEE and STUDENT. Here, both EMPLOYEE and STUDENT inherit the same attributes from PERSON. The rule states that ifan attribute (or relationship) originating in the same superclass (PERSON) is inherited more than once via different paths (EMPLOYEE and STUDENT) in the lattice,then it should be included only once in the shared subclass (STUDENT_ASSISTANT).Hence,the attributes of PERSONare inherited only oncein the STUDENT_ASSISTANT subclass in Figure 8.7.
The first Enhanced ER (EER) model concept we take up is that ofa subtype or subclass ofan entity type.As we discussed in Chapter 7,an entity type is used to represent both a type of entity and the entity set or collection of entities of that type that exist in the database.For example,the entity type EMPLOYEE describes the type (that is, the attributes and relationships) of each employee entity, and also refers to the current set of EMPLOYEE entities in the COMPANY database. In many cases an entity type has numerous subgroupings or subtypes ofits entities that are meaning- ful and need to be represented explicitly because oftheir significance to the database application. For example, the entities that are members of the EMPLOYEE entity type may be distinguished further into SECRETARY, ENGINEER, MANAGER, TECHNICIAN, SALARIED_EMPLOYEE, HOURLY_EMPLOYEE,and so on.The set of entities in each ofthe latter groupings is a subset ofthe entities that belong to the EMPLOYEE entity set,meaning that every entity that is a member ofone ofthese
subgroupings is also an employee.We call each ofthese subgroupings a subclass or subtype ofthe EMPLOYEE entity type,and the EMPLOYEE entity type is called the superclassor supertype for each of these subclasses.Figure 8.1 shows how to repre- sent these concepts diagramatically in EER diagrams. (The circle notation in Figure 8.1will be explained in Section 8.2.) We call the relationship between a superclass and any one ofits subclasses a superclass/subclass or supertype/subtype or simply class/subclass relationship.3 In our previous example, EMPLOYEE/SECRETARYand EMPLOYEE/TECHNICIAN are two class/subclass relationships. Notice that a member entity of the subclass repre- sents the same real-world entity as some member of the superclass; for example, a SECRETARY entity 'Joan Logano'is also the EMPLOYEE 'Joan Logano.' Hence, the subclass member is the same as the entity in the superclass,but in a distinct specific role. When we implement a superclass/subclass relationship in the database system, however,we may represent a member ofthe subclass as a distinct database object— say,a distinct record that is related via the key attribute to its superclass entity.In Section 9.2, we discuss various options for representing superclass/subclass rela- tionships in relational databases.
In this definition,a conceptualization is the set ofconcepts that are used to repre- sent the part ofreality or knowledge that is ofinterest to a community ofusers. Specification refers to the language and vocabulary terms that are used to specify the conceptualization. The ontology includes both specification and conceptualization.For example,the same conceptualization may be specified in two different languages, giving two separate ontologies. Based on this quite general def- inition,there is no consensus on what an ontology is exactly.Some possible ways to describe ontologies are as follows:
■ A thesaurus (or even a dictionaryor a glossary of terms) describes the rela- tionships between words (vocabulary) that represent various concepts. ■ A taxonomy describes how concepts of a particular area of knowledge are related using structures similar to those used in a specialization or general- ization. ■ A detailed database schema is considered by some to be an ontology that describes the concepts (entities and attributes) and relationships of a mini- world from reality. ■ A logical theory uses concepts from mathematical logic to try to define con- cepts and their interrelationships.
Design Choices for Specialization/Generalization It is not always easy to choose the most appropriate conceptual design for a database application. In Section 7.7.3, we presented some of the typical issues that confront a database designer when choosing among the concepts of entity types, relationship types,and attributes to represent a particular miniworld situation as an ER schema. In this section,we discuss design guidelines and choices for the EER concepts of specialization/generalization and categories (union types). As we mentioned in Section 7.7.3, conceptual database design should be considered as an iterative refinement process until the most suitable design is reached.The fol- lowing guidelines can help to guide the design process for EER concepts:
■ In general,many specializations and subclasses can be defined to make the conceptual model accurate. However, the drawback is that the design becomes quite cluttered. It is important to represent only those subclasses that are deemed necessary to avoid extreme cluttering ofthe conceptual schema. ■ If a subclass has few specific (local) attributes and no specific relationships,it can be merged into the superclass.The specific attributes would hold NULL values for entities that are not members ofthe subclass.A type attribute could specify whether an entity is a member ofthe subclass. ■ Similarly, if all the subclasses of a specialization/generalization have few spe- cific attributes and no specific relationships,they can be merged into the superclass and replaced with one or more type attributes that specify the sub- class or subclasses that each entity belongs to (see Section 9.2for how this criterion applies to relational databases). ■ Union types and categories should generally be avoided unless the situation definitely warrants this type of construct, which does occur in some practi- cal situations.If possible, we try to model using specialization/generalization as discussed at the end ofSection 8.4. ■ The choice of disjoint/overlapping and total/partial constraints on special- ization/generalization is driven by the rules in the miniworld being modeled. If the requirements do not indicate any particular constraints, the default would generally be overlapping and partial,since this does not specify any restrictions on subclass membershipan example of applying these guidelines, consider Figure 8.6, where no specific (local) attributes are shown.We could merge all the subclasses into the EMPLOYEE entity type,and add the following attributes to EMPLOYEE: ■ An attribute Job_type whose value set {'Secretary', 'Engineer', 'Technician'} would indicate which subclass in the first specialization each employee belongs to. ■ An attribute Pay_method whose value set {'Salaried','Hourly'} would indicate which subclass in the second specialization each employee belongs to. ■ An attribute Is_a_manager whose value set {'Yes', 'No'} would indicate whether an individual employee entity is a manager or not.