Chapter 2 - Data Models
A implementation-ready data model should contain:
- A description of the data structure that will store the end-user data - A set of enforceable rules to guarantee the integrity of the data - A data manipulation methodology to support the real-world data transformations
Any SQL-based relational database application involves:
- A user interface - A set of tables stored in the database - The SQL "engine"
Entity Set
A collection of like entities.
Class
A collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object's data representation and a method's implementation. Classes are organized in a class hierarchy.
Physical independence
A condition in which the physical model can be changed without affecting the internal model.
Class Diagram
A diagram used to represent data and their relationships in UML object notation.
Physical model
A model in which physical characteristics such as location, path, and format are described for the data. The physical model is both hardware- and software-dependent.
software independence
A property of any model or application that does not depend on the software used to implement it.
entity instance (entity occurrence)
A row in a relational table
Conceptual Schema
Basis for the identification and high-level description of the main data objects
Key-Value
Data model is based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value of set of values. Aka attribute-value or associative data model
Data Definition Language (DDL)
Enables the database administrator to define the schema components. The language that allows a database admin to define the database structure, schema, and subschema.
End-user interface
Interface allows end user to interact with the data (by automatically generating SQL code). Each interface is a product of the software vendor's idea of meaningful interaction with the data.
Constraint
Restriction placed on the data - Important to help ensure data integrity EX: Student's GPA must be between 0.00 to 4.00 - Each class must have one and only one teacher.
SubSchema
The portion of the database that interacts with application programs
External Schema
The specific representation of an external view; the end user's view of the data environment.
hardware independence
A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level.
NoSQL
A new generation of database management systems that is not based on the traditional relational database model. Large-scale distributed database system that stores structured and unstructured data in efficient ways. Provides distributed, fault-tolerate databases for processing non-structured data.
Entity
A person, place, thing, concept, or event for which data can be stored. Represents a particular type of object in the real world, which means an entity is "distinguishable" Unique & distinct
internal schema
A representation of an internal model using the database constructs supported by the chosen database.
One-to-one (1:1 or 1..1) relationship
Associations among two or more entities that are used by data models. In a 1:1 relationship, one entity instance is associated with only one instance of the related entity. EX: A retail company's mgmt structure may require that each of its stores be managed by a single employee. [EMPLOYEE manges STORE 1:1]
Network Model
Created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard. User perceives network database as a collection of records in 1:M relationships. Unlike the hierarchical model, network allows a record to have more than one parent.
Object-Oriented database management system (OODBMS)
Data management software used to manage data in an object-oriented database model.
Relational databases and Big Data
Does this mean that relational databases don't have a place in organizations with Big Data challenges? No, relational databases remain the preferred and dominant databases to sup- port most day-to-day transactions and structured data analytics needs. Each DBMS tech- nology has its areas of application, and the best approach is to use the best tool for the job. In perspective, object/relational databases serve 98 percent of operational market needs. For Big Data needs, Hadoop, MapReduce, and NoSQL databases are the options.
Basic building blocks of all data models
Entities, attributes, relationships, and contraints
Tuple
In the relational model, a table row. Each column represents an attribute.
SQL Engine
Largely hidden from the end user, the SQL engine executes all que- ries, or data requests. Keep in mind that the SQL engine is part of the DBMS software. The end user uses SQL to create table structures and to perform data access and table maintenance. The SQL engine processes all user requests—largely behind the scenes and without the end user's knowledge.
client node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The client node acts as the interface between the user application and the HDFS.
Big Data
Refers to a movement to find new and better ways to manage large amounts of web and sensor-generated data and derive business insight from it, while simultaneously providing high performance and scalability at a reasonable cost.
conceputal model
Represents a global view of the entire database by the entire organization. Integrates all external views into a single global view of the data in the enterprise
Chen notation
See entity relationship (ER) model. A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams.
Logical design
Task of creating a conceptual data model A stage in the design phase that matches the conceptual design to the requirements of the selected DBMS and is therefore software dependent. Used to translate the conceptual design into the internal model for the selected database management system
Data Modeling
The process of creating a specific data model for a determined problem domain.
Connectivity
The type of relationship between entities. Classifications include 1:1, 1:M, and M:N. EX: PAINTER paints many PAINTINGS, an EMPLOYEE learns many SKILLS,
3 V's of Big Data:
Volume - amount of data being stored. Velocity - speed at which data grows and the need to process data quickly in order to generate information and insight Variety - fact that data being collected comes in multiple different data formats.
Business Rule
a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization A description of a policy, percedure, or principle within the organization. - Must be rendered in writing and updated to reflect any change in the organization's operational environment Properly written bus rules are used to define entities, attributes, relationships, and constraights EX: A customer may generate many invoices An invoice is generated by only one customer. - A training session cannot be schedules for fewer than 10 employees or for more than 30 employees. (Two entities, employee and training, implies relationship between the two and shows a contraight (no <10 & no > 30)
Data Model
a relatively simple representation, usually graphical, of more complex real-world data structures
Map Reduce
an open-source application programming interface (API) that provides fast data analytics services; one of the main big data technologies that allows organizations to process massive data stores. 2 Main function: Map - takes a job and divides it into smaller units of work Reduce - collects all the output results generated from the nodes and integrates them into a single result set
Internal Model
representation of the database as "seen" by the DBMS In database modeling, a levle of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. Internal model requires a requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
Sparse Data
the number of attributes is very large but the number of actual data instances is low
Evolution of data models... Common characteristics
A data model must show some degree of conceptual simplicity without compro- mising the semantic completeness of the database. It does not make sense to have a data model that is more difficult to conceptualize than the real world. At the same time, the model should show clarity and relevance; that is, the data model should be unambiguous and applicable to the problem domain. A data model must repre- sent the real world as closely as possible.
Entity Relationship (ER) Model [ERM]
A data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual level with the help of ER diagrams. The model was developed by P. Chen in 1975. BC it complemented the relational data model concepts
Hadoop (Big Data technologies)
A java based, open source, high speed, fault tolerant distributed storage and computational framework. Uses low-cost hardware to create clusters of 1000s of computer nodes to store and process data. Provide a framework for Big Data Analytics in which data (structured or unstructured) is distributed, replicated, and processed in parallel using a network of low-cost commodity hardware.
Schema
A logical grouping of database objects such as tables, indexes, views, and queries, that are related to each other.
Method
In the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions, and are invoked through messages.
Attribute
a characteristic of an entity. EX: CUSTOMER Entity would be described by attributes such as customers last name, first name, phone number, address, and credit limit Equivalent of fields in a file system
Relationship
describes an association among entities. EX: Relationship exists between customers and agents that ca be descibed as: agent can serve many customers and each customer may be served by one agent.
Process of identifying and documenting business rules is essential to database design for several reasons:
- Helps to standardize the company's view of data - Can be a communication tool between users and designers - It allows the designer to understand the nature, role, and scope of data - Allows the designer to understand the business processes - Allows the designer to develop appropriate relationships participation rules and constraints and to create an accurate data model.
Relational Database Management System (RDBMS)
A collection of programs that manages a relational database. The RDBMS software translates a user's logical requests (queries) into commands that physically locate and retrieve the requested data. ADV: Ability to hide the complexities of a the relational model from the user. Manages all the physical details, while the user sees the relational database as a collection of tables in which data is stored. The user can manipulate and query the data in a way that seems intuitive and logical.
logical independence
A condition in which the internal model can be changed without affecting the conceptual model. (The internal model is hardware-independent because it is unaffected by the computer on which the software is installed. Therefore, a change in storage devices or operating systems will not affect the internal model.)
Summary
A data model is an abstraction of a complex real-world data environment. Database designers use data models to communicate with programmers and end users. The basic data-modeling components are entities, attributes, relationships, and con- straints. Business rules are used to identify and define the basic modeling components within a specific real-world environment. • The hierarchical and network data models were early models that are no longer used, but some of the concepts are found in current data models. • The relational model is the current database implementation standard. In the rela- tional model, the end user perceives the data as being stored in tables. Tables are related to each other by means of common values in common attributes. The entity relationship (ER) model is a popular graphical tool for data modeling that comple- ments the relational model. The ER model allows database designers to visually pres- ent different views of the data—as seen by database designers, programmers, and end users—and to integrate the data into a common framework. • The object-oriented data model (OODM) uses objects as the basic modeling struc- ture. Like the relational model's entity, an object is described by its factual content. Unlike an entity, however, the object also includes information about relationships between the facts, as well as relationships with other objects, thus giving its data more meaning. • The relational model has adopted many object-oriented (OO) extensions to become the extended relational data model (ERDM). Object/relational database management systems (O/R DBMS) were developed to implement the ERDM. At this point, the OODM is largely used in specialized engineering and scientific applications, while the ERDM is primarily geared to business applications. • Emerging Big Data technologies such as Hadoop, MapReduce, and NoSQL provide distributed, fault-tolerant, and cost-efficient support for Big Data analytics. NoSQL databases are a new generation of databases that do not use the relational model and are geared to support the very specific needs of Big Data organizations. NoSQL data- bases offer distributed data stores that provide high scalability, availability, and fault tolerance by sacrificing data consistency and shifting the burden of maintaining rela- tionships and data integrity to the program code. • Data-modeling requirements are a function of different data views (global versus local) and the level of data abstraction. The American National Standards Institute Standards Planning and Requirements Committee (ANSI/SPARC) describes three levels of data abstraction: external, conceptual, and internal. The fourth and lowest level of data abstraction, called the physical level, is concerned exclusively with phys- ical storage methods.
Object-Oriented Data Model (OODM)
A data model whose basic modeling structure is an object.
Entity Relationship Diagram (ERD)
A diagram that depicts an entity relationship model's entities, attributes, and relations.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. Write-once, read many model. - Once data is written, it cannot be modified.
Unified Modeling Language (UML)
A language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system.
Table (Relation)
A logical construct perceived to be a two dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model.
Crow's Foot Notation
A representation of the entity relationship diagram that uses a three-pronged symbol to represent the "many" sides of the relationship
Relational Diagram
A representation of the relational database's entities, the attributes within those entities, and the relationships between those entities.
External Views ADV:
ADV: - Easy to identify specific data required to support each business unit's operations. - Makes the designer's job easy by providing feedback about the model's adequacy. - Helps to ensure security contraints in the database design. - Makes application simplier
Object
An abstract representation of a real world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself.
Hierarchical Model
An early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a 1:M relationship to the segment directly below it.
Many-to-many (M:N or *..*) relationship
Associations among two or more entities in which one occurrence of an entity is associated with many occurrences of a related entity and one occurrence of the related entity is associated with many occurrences of the first entity. EX: Employee may learn many job skills and each job skill may be learned by many employees [EMPLOYEE learns SKILL] EX2: Students can take many classes and each class can be taken by many students [STUDENTS takes CLASS]
One-To-Many (1:M or 1..*) relationship
Associations among two or more entities that are used by data models. In a 1:M relationship, one entity instance is associated with many instances of the related entity. EX: A painter creates many different paintings, but each is painted by only ONE painter. Painter (one) to Paintings (Many). [Painter paints Painting 1:M] EX2: CUSTOMER generates INVOICE [Customer (one) to invoices (many) 1:M]
Relational Model
Developed by E. F. Codd of IBM in 1970, it represented a major breakthrough for users and designers because of its conceptual simplicity. The relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a matrix of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns).
Segement
In the hierarchical data model, the equivalent of a file system's record type.
Inheritance
In the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy. See also class hierarchy.
NoSQL databases
Not based on the relational model - Support distributed database architectures, - Provide high scalability, high availability, - fault tolerance, - Support large amounts of sparse data, - Geared toward performance rather than transaction consistency, - Store data in key-value stores NOT BASED ON THE RELATIONAL MODEL
Data node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The data node stores fixed-size data blocks (that could be replicated to other data nodes). .
name node
One of three types of nodes used in the Hadoop Distributed File System (HDFS). The name node stores all the metadata about the file system.
Entity in ERD
Represented by a rectangle (aka entity box) Mapped to a relational table Each entity consists of a set of attributes that describes particular characteristics of the entity. [EX: EMPLOYEE will have ssn, last name, first name]
External Model
The application programmer's view of the data environment. Given its business focus, an external model works with a data subset of the global database schema.
Semantic Data Model
The first of a series of data models that more closely represented the real world, modeling both data and their relationships in a single structure known as an object. The SDM, published in 1981, was developed by M. Hammer and D. McLeod.
Class Hierarchy
The organization of classes in a hierarchical tree in which each parent class is a superclass and each child class is a subclass. See also inheritance.
Data Manipulation Language (DML)
The set of commands that allows an end user to manipulate the data in the database. The commands include SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK. Defines environment in whcih data can be managed and is used to work with teh data in the database.
Class Diagram Notation
The set of symbols used in the creation of class diagrams.
Eventual Consistency
updates to the database will propagate through the system and eventually all data copies will be consistent