Exam 1 Review - INSY 3304
Primary Key
- A candidate key selected to uniquely identify an entity. - Cannot have null values (A null value is no value, it is NOT equal to a zero or a blank space). - Enforces Entity Integrity (Guarantees that each entity is uniquely identified by a non-null primary key value) - A ______ is a superkey as well as a candidate key. ex: social security number, driver's license number, student ID
Relationship
- A relationship is an association between entities. - Relationships are described as verbs. - Relationships are represented by diamond-shaped symbols.
characteristics of a Relational Table
- A table is composed of rows and columns. - Each row (tuple) represents a single entity within the entity set. - Each column represents an attribute and is identified by a distinct name.
Foreign Key
- An attribute (or a combination of attributes) in one table whose values must either match the primary key values in a designated table or be null. - Used to logically link one table with another (compare with the physical pointers in Hierarchical and Network models). - Enforces Referential Integrity (Guarantees valid references to another table, i.e., cannot delete a tuple from a table that is referenced by in another table through a foreign key). ex: a table called TEAM may have an attribute, MEMBER_NAME, which is a foreign key referencing a candidate key, PERSON_NAME, in the PERSON table
Benefits of the external model
- Application program development is simplified because the programmer does not have to be concerned about data not relevant to his/her application. - Communication with the end-user is simplified. - Identification of data required to support each business unit's operation is simplified. - Access control and security can be easily implemented.
The Entity Relationship (E-R) Model: Attributes
- Attributes are represented by ovals and are connected to the entity with a line. - Each oval contains the name of the attribute it represents. - Attributes have a domain -- the attribute's set of possible values. - Attributes may share a domain. - Primary keys are underlined.
limitations of Object-Oriented Data Model
- Data access method resembles the navigational style - Lack of standardization
benefits of relational data model
- Data independence and structural independence - Easy to design the database and to manage its contents - Less programming effort required - Powerful and flexible query capability: structured query language (SQL)
benefits of hierarchical data model
- Data sharing and security provision - Data independence -> Reduced programming and maintenance effort - Database integrity-> a child must have a parent - Efficiency dealing with a large database containing data with predominantly 1:M relationships -> many business applications fall in this category
The database system consists of four major components:
- Database • Collection of integrated and shared data - Software • Database Management System (DBMS) • Application programs - Hardware • Processor: CPU • Memory: primary, secondary and tertiary - People • Administrators, Designers, Developers, End Users
Server-side software
- Database management system (DBMS) - DBMS does the back-end processing
benefits of network data model
- Easier implementation of M:N relationships - More flexible data access compared to Hierarchical model -> An application can access an owner and all the members within the set, one can also access owner from a member. - Enforced data integrity -> owner record must be defined before defining member records - Data independence through the DBMS
Client-side software (User App)
- Example: SAP - Connects to the DB via a data access API (application programming interface) - The client software does the front-end processing - Typically coding language, API, and database must be compatible (Java, JDBC, Oracle)
Goals of Big Data
- Find new and better ways to manage large amounts of web and sensor-generated data - Provide high performance at a reasonable cost
Limitations of hierarchical data model
- Inability to represent relationships that do not conform to the hierarchical 1:M standard - Applications programming is complex -> programmer must know the database structure and how to traverse it - Lack of standard concepts and implementation (No standard was developed)-- limited portability - Requires extensive programming activities to use the database
Benefits of Object-Oriented Data Model
- Provides a richer modeling language -> captures attributes, methods, and class hierarchies - Useful in niche application areas such as CAD/CAM
limitations of relational data model
- RDBMS requires substantial hardware and operating system overhead. - It tends to be slower than other database systems.
Multivalued Attribute in Relational DBMS
- The relational DBMS cannot implement multivalued attributes. - Possible courses of action for the designer • Within the original entity, create several new attributes, one for each of the original multivalued attribute's components. Example: CAR_COLOR -> CAR_TOPCOLOR, CAR_BODYCOLOR, CAR_TRIMCOLOR • Create a new entity composed of the original multivalued attribute's components.
E-R Model Components
- Three components: Entities, Attributes, and Relationships. - Entities • In E-R model an entity refers to the entity set, not an entity occurrence. • A row in a table corresponds to an entity instance. • An entity is represented by a rectangle containing the entity's name. ex: student
Secondary Key
- Used for data retrieval purpose. - May consist of a single attribute or a combination of attributes. - The DBMS maintains indexes on secondary keys for faster search and retrieval of data. - May have duplicate values. an additional key, or alternate key, which can be use in addition to the primary key to locate specific data. ex: Enrollment_NO is chosen as the primary key then, Class_ ID + Student_ID is said to be the secondary key
limitations of network data model
- Very complex structure from the application programmer's point of view - Difficult to design and use properly -> because of the navigational nature of the data structure - Difficult to make changes in a database -> no structural independence
Characteristics of Big Data
- Volume • Large amounts of data being stored - Velocity • Data accumulates at a very high speed and needs to be processed at a very fast pace - Variety • Diversity in data format - numbers, letters, text, audio, video, click-streams, sensor data
Strong (Identifying) Relationship
- exists when the PK of the related entity contains a PK component of the parent entity. - The child entity is a weak entity. - Crow's Foot model uses a solid line to connect the two entities. Example: An account table person table and personAccount. The person account table is identified by the existence of account and person table only
Weak (Non-Identifying) Relationship
- exists when the PK of the related entity does not contain a PK component of the parent entity. - Crow's Foot model uses a dashed line to connect the two entities.
weak entity
- is existence-dependent on another entity and - has a primary key that is partially or totally derived from its strong counterpart. - indicated by a double rectangle can only exist when owned by another one. For example: a ROOM can only exist in a BUILDING. On the other hand, a TIRE might be considered as a strong entity because it also can exist without being attached to a CAR.
basic structure of relational data model
- perceived by the user as a collection of tables in which data are stored. - Each table consists of series of row/column intersections. - Tables (or relations) are related to each other by sharing a common entity characteristic. - The relationship type is often shown in a relational schema. - A table yields complete data and structural independence because it is a purely logical structure.
The Conceptual Model
- represents a global view of the organization's data as viewed by all end-users of an information system - It describes all entities and their attributes, the relationships among these entities and the constraints on these relationships. - forms the basis for the conceptual schema - a description of the database structure. - The conceptual model is independent of both software (DBMS and OS) and hardware.
Candidate Key
A minimal superkey, i.e., it does not contain a subset of attributes that is itself a superkey a subset of the super key that can uniquely identify the other attributes of the table helps in determining the prime and non-prime attributes of a table and ensures the integrity of the data by preventing duplicate data ex: ID is a candidate key as it can find all the rest attributes, i.e., Roll_No, and Name of the table
Implications of Cardinality
A professor listed in the professor table may appear in the Class table as few as zero times or as many as 3 times.
Set
A relationship Each ___ is composed of at least two record types: an owner (parent) record and a member (child) record.
Superkey
An attribute or a combination of attributes that uniquely identifies each row in a table. a single key or a group of multiple keys that can uniquely identify tuples in a table can contain redundant attributes that might not be important for identifying tuples ex: In the above EMPLOYEE table, for(EMPLOYEE_ID, EMPLOYEE_NAME), the name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key. The super key would be EMPLOYEE-ID (EMPLOYEE_ID, EMPLOYEE-NAME), etc.
Development IDE
Another Client-side software - Integrated development environment - Used only by Developers - e.g. SQL Developer - Used to Write SQL that can • Create database • Read database • Update database • Delete data or parts of DB
Functional Dependence
Attribute B is functionally dependent on attribute A (attribute A determines attribute B: A -> B) if all the rows in a table that agree in value for attribute A must also agree in value for attribute B. a constraint that specifies the relationship between two sets of attributes where one set can accurately determine the value of other sets. Example: The details of the name of the employee, salary and city are obtained by the value of the number of Employee (or id of an employee). So, it can be said that the city, salary and the name attributes are functionally dependent on the attribute Employee Number
Metadata
Data that describes the properties and context of user data
atomic
Each row/column intersection represents a single data value
Natural join
EquiJoin with the duplicate column removed. Performed by a Project on the result of equijoin. (aka Inner Join) - When the term Join is mentioned without any prefix, it is implied to be Natural Join.
Theta Join
EquiJoin with the equality operator replaced by any other comparison operator, such as greater than, less than, etc
Existence Dependence
If an entity's existence depends on the existence of one or more other entities, it is said to be existence-dependent. - Example: • CLASS is existence dependent on COURSE - Course code appears as a foreign key in the Class table The implication of existence dependence is that a Course instance must be created in the Course table before the corresponding Class instances may be added to the Class table.
schema
defines the conceptual organization of the entire database as viewed by the database administrator.
Data Management Language
defines the data characteristics and the data structure and to manipulate the data.
subschema
defines the portion of the database as seen by the applications programs.
Hierarchical Data Model
developed in the 1960s to manage large amounts of data for complex manufacturing projects, such as, the Apollo rocket used in lunar missions Collection of records (segments) organized to conform to the upside-down tree structure. One-to-Many (1:M) Relationship • Each parent can have many children. • Each child has only one parent.
An ______ is anything about which data are to be collected and stored
entity Examples: a single person, single product, or single organization.
unary relationship
exists when an association is maintained within a single entity. (Recursive relationship)
binary/ternary relationship
exists when two entities are associated. (Most common) Three entities participate in a ternary relationship.
Cardinality
expresses the specific number of entity occurrences associated with one occurrence of the related entity. • This is determined by an organization's business policy.
If attribute B is functionally dependent on a composite key A but not any subset of A then B is _________ on A
fully functionally dependent
System catalog
is a very detailed system data dictionary. It describes all objects within the database.
data model
is an abstract representation of the data requirement for business operations and decision making. It represents data entities and their characteristics, relationships among entities, and constraints on these relationships. also serves as a communications tool to facilitate the interaction among the designer, the applications programmer, and the end user.
A ____ consists of one or more attributes that determines other attributes within an entity.
key
derived attribute
not physically stored within the database; its value is computed from other attributes. - It is indicated using a dotted line connecting the attribute with the entity. • Example: AGE can be derived from DOB and current date. • Number of years of employment
composite entity
often created in the process of replacing a M:N relationship by a pair of 1:M relationships. serves as a bridge between the related entities. Its primary key is composed of the primary keys of the entities it connects. It may contain additional attributes besides the primary key. For example, In student entity, the student address is a composite attribute as an address is composed of other attributes such as pin code, state, country
recursive entity
one in which a relationship can exist between occurrences of the same entity set. found within a unary relationship
The Physical Model
operates at the lowest level of abstraction, describing the way data is stored on storage media such as disks or tapes. requires the definition of both the physical storage devices and the access methods required to reach the data within those storage devices. both software and hardware-dependent
Project
produces a list of all values for selected attributes. It yields a vertical subset of a table.
A _____ describes an association among two or more entities.
relationship there are 3 types: - one to many - many to many - one to one
DataBase Task Group (DBTG) specified three crucial database components:
schema subschema data management language
Relationship participation depends on
the business rule of the organization.
The External Model
the end user's/ applications programmer's view (local view) of the database environment. concerned about a specific business operation
Integrity Constraints
the protocols that a table's data columns must follow These are used to restrict the types of information that can be entered into a table data in the database is accurate and reliable
connectivity
used to describe the relationship classification (e.g., one-to-one, one-to-many, and many-to-many).
many to many relationship
when one or more items in one table can have a relationship to one or more items in another table. example: Your Order table contains orders placed by multiple customers (who are listed in the Customers table), and a customer may place more than one order. example: a many-to-many relationship exists between customers and products: customers can purchase various products, and products can be purchased by many customers.
Select
yields all attributes of selected tuples that satisfy a specified condition. It produces a horizontal subset of a table.
Cardinality Constraints Summarization
• Constraints: the number of instances of one entity that can or must be associated with each instance of another entity • Minimum cardinality - If zero, then optional - If one or more, then mandatory • Maximum cardinality - The maximum number
Outer Join
Unmatched rows from the participating tables are retained in the result table with unmatched attributes left blank or null. - Left Outer join keeps all tuples from the left relation - Right Outer Join keeps all tuples from the right relation
one to many relationship
When each entry in one table may be linked to one or more records in the other table. This is the most common type of relationship found in DBMS. example: each customer can have many sales orders. In this example the primary key field in the Customers table, Customer ID, is designed to contain unique values. example: A teacher can teach multiple classes, but the course would not have the same relationship with the teacher. Therefore, for each record in a Teachers table, there could be many records in the Courses table
Network Data Model
a hierarchical model that is used to represent the many-to-many relationship among the database constraints different from the hierarchal database model as there can be numerous parents of a member
one to one relationship
a link between the information in two tables, where each record in each table only appears once. For example, in a school database, each student has only one student ID, and each student ID is assigned to only one person.
The Internal Model
adapts the conceptual model to a specific DBMS (e.g., hierarchical, network, and relational) software-dependent but hardware-independent Development of the internal model is especially important to hierarchical and network database models because the data access efficiencies in these models can be affected by improper design.
Associative Entity
an entity type that associates the instances of one or more entity types and contains attributes that are peculiar to the relationship between those entity instances. a relationship about which you want to store information An example of the practical use of an associative table would be to assign permissions to users. There can be multiple users, and each user can be assigned zero or more permissions. Individual permissions may be granted to one or more users.
An _____ is a characteristic of an entity
attribute For example, in a customer database, the attributes might be name, address, and phone number. In a product database, the attributes might be name, price, and date of manufacture
All values in a column must conform to the same data format (type) and must be within a specified range, known as the
attribute domain
composite attribute
can be further subdivided to yield additional attributes. - Examples: • ADDRESS -> Street, City, State, Zip • PHONE NUMBER -> Area code, Exchange number - Don't confuse with composite key, where two attributes together form a primary key. • Example of composite key:- CLASS(Course_ID, Section_Nbr, Class_Time, Prof_Num)
A multivalued attribute
can have many values. are shown by a double line connecting to the entity,or single line connecting to an ellipse with a double line. - Examples: • A person may have several college degrees. • A household may have several phones with different numbers
single-valued attribute
can have only a single value - Examples: • A person can have only one social security number. • A manufactured part can have only one serial number.
simple attribute
cannot be subdivided - Examples: Age, Sex, and Marital status
A key that is composed of more than one attributes is known as a _____
composite key
Table
contains a group of related entity occurrences - i.e. an entity set. also called a relation
Data dictionary
contains metadata that describes the data stored in the database. It stores: - the names of the data items in the database - the types and sizes of the data items - the constraints on each data item - the names of authorized users, the data items that each user can access, and the types of access allowed.
Union Compatibility
Means that they must have the same number of attributes with matching domains Union compatible means that the relations yield attributes with identical names and. compatible data types. That is, the relation A(c1,c2,c3) and the relation B(c1,c2,c3) have union compatibility if the columns have the same names, are in the same order, and the. columns have "compatible" data types.
cardinality continued
Number of relationship instances that a given entity instance can participate in
Database Management Systems (DBMS)
Software used to define, create, maintain, and provide controlled access to the database Define: Specify data types, structures and constraints for the data to be stored Populate: Process of entering/ storing the data Manipulate: includes such functions as querying to retrieve specific data, updating the database to reflect changes, and generating reports from the data Control: grant / restrict permissions, security
degree
The number of columns
cardinality
The number of tuples in a table
Step 1: Map Regular Entity Types
• Name given to relation is generally the same as the name of the entity type • Each simple attribute in the entity type is the attribute of the relation • PK- Identifier is the primary key
Object-Oriented Data Model
• Objects are abstractions of real-world entities -> Example: Student, Invoice, Employee • Objects capture both attribute and behavior (methods) • Objects with similar characteristics are grouped together as a class • Classes are organized in a class hierarchy • A subclass inherits the attributes and methods from its superclass
ERD Diagrams to Relations Steps
• Step 1: Map regular entities • Step 2: Map weak entities • Step 3: Map binary relationships • Step 4: Map associative entities • Step 5: Map unary relationships • Step 6: Map Ternary relationships • Step 7: Map supertype/subtype relationships
Relationship Participation
• The participation is optional if one entity occurrence does not require a corresponding entity occurrence in a particular relationship. • An optional entity is shown by a small circle on the side of the optional entity. - Example: Class is optional to Professor Professor is mandatory to Class, so every Class must have a Professor assigned to it.
Intersect
• The tables must be union compatible • A Intersect B results in C that contains tuples that are common to both A and B.
Difference
• The tables must be union compatible • A MINUS B results in C that contains the tuples that appear in A but not in B.
Union
• The tables must be union compatible • A Union B results in C that contains all tuples from both A and B with no duplicates.
Relational Database Operators
• These operators are based on relational algebra theory. • They define functions to manipulate data in one or more tables (relations). • Application of a relational operator to one or more tables results in another table. • The eight relational operators are: UNION, INTERSECT, DIFFERENCE, PRODUCT, SELECT, PROJECT, JOIN, and DIVIDE. • To be considered minimally relational, a DBMS must support SELECT, PROJECT and JOIN
Why and when to use associative entity
• Why? - Preserve information in the relationship (quasi-entity type) • When? - Usually "many-to-many" relationship (i.e., max cardinality is "many" on both sides) - Associative entity has a unique identifier - Associative entity usually has several attributes - Associative entity has a meaning of its own - Associative entity may participate in other relationships
Product
● produces a list of all possible pairs of rows from two tables. ● If table A has 5 rows and B has 10, A product B will yield a table with 50 rows.
