CSCE310 Chapter 2
Model
Abstraction of a real-world object or event
Class hierarchy
Resembles an upside-down tree in which each class has only one parent
Logical design
Task of creating a conceptual data model
In this class, all our diagrams are in... (diagram notation type)
UML
Advantages of Object-Oriented Model
• Semantic content is added • Visual representation includes semantic content • Inheritance promotes data integrity
Advantages of the Entity Relationship Model
• Visual modeling yields conceptual simplicity • Visual representation makes it an effective communication tool • Is integrated with the dominant relational model
What are the challenges of Big Data?
• Volume does not allow the usage of conventional structures • Expensive • OLAP tools proved inconsistent dealing with unstructured data
The Internal Model
•Representing database as seen by the DBMS mapping conceptual model to the DBMS • Is software dependent and hardware independent • Internal schema • Logical independence
Advantages of the Hierarchial Model
• Promotes data sharing • Parent/child relationship promotes conceptual simplicity and data integrity • Database security is provided and enforced by DBMS • Efficient with 1:M relationships
The Conceptual Model
• Represents a global view of the entire database by the entire organization • Has a macro-level view of data environment • Is software and hardware independent • Conceptual schema • Logical design
Disadvantages of the Hierarchial model
• Requires knowledge of physical data storage characteristics • Navigational system requires knowledge of hierarchical path • Changes in structure require changes in all application programs • Implementation limitations • No data definition • Lack of standards
Disadvantages of the Relational Model
• Requires substantial hardware and system software overhead • Conceptual simplicity gives untrained people the tools to use a good system poorly • May promote information problems
Disadvantages of Object-Oriented Model
• Slow development of standards caused vendors to supply their own enhancements oCompromised widely accepted standard • Complex navigational system • Learning curve is steep • High system overhead slows transactions
Disadvantages of Object Oriented Model
• Slow development of standards caused vendors to supply their own enhancements oCompromised widely accepted standard• Complex navigational system• Learning curve is steep• High system overhead slows transactions
Advantages of the Relational Model
• Structural independence is promoted using independent tables • Tabular view improves conceptual simplicity • Ad hoc query capability is based on SQL • Isolates the end user from physical- level details • Improves implementation and management simplicity
Disadvantages of the Network Model
• System complexity limits efficiency • Navigational system yields complex implementation, application development, and management • Structural changes require changes in all application programs
Conceptual schema
Basis for the identification and high-level description of the main data objects
Physical independence
Changes in physical model do not affect internal model
Logical independence
Changing internal model without affecting the conceptual model
Attribute
Characteristic of an entity
Class
Collection of similar objects with shared structure and behavior organized in a class hierarchy
Attribute
Columns
Sources of Business Rules
Company, managers, Policy makers, Department, managers, Written documentation, Direct interviews with end users
Object
Contains data and their relationships with operations that are performed on it o Basic building block for autonomous structures o Abstraction of real-world entity
The Relational Model
Describes a precise set of data manipulation constructs
Relationship
Describes an association among entities • One-to-many (1:M) • Many-to-many (M:N or M:M) • One-to-one (1:1)
The Entity Relationship Model
Graphical representation of entities and their relationships in a database structure
Data modeling
Iterative and progressive process of creating a specific data model for a determined problem domain
Relation or table
Matrix composed of intersecting tuple and attribute
Inheritance
Object inherits methods and attributes of parent class
Attribute name
Required to be descriptive of the data represented by the attribute
Tuple
Row
Constraint
Set of rules to ensure data integrity
Data models
Simple representations of complex real-world data structures o Useful for supporting a specific problem domain
Internal schema
Specific representation of an internal model oUses the database constructs supported by the chosen database
Connectivity
Term used to label the relationship types
Entity
Unique and distinct object used to collect and store data, THE TABLE
segments
equivalent of a file system's record type
MapReduce
is a an open-source application program interface (API) that provides fast data analytics services.
Hadoop Distributed File System (HDFS)
is a highly distributed, fault- tolerant file storage system designed to manage large amounts of data at high speeds.
Hadoop
is a java based, open-source, high-speed, fault-tolerant distributed storage and computational framework
NoSQL
is a large-scale distributed database system that stores structured and unstructured data • Not based on the relational model • Support distributed database architectures • Provide high scalability, high availability, and fault tolerance • Support large amounts of sparse data • Geared toward performance rather than transaction consistency • Store data in key-value stores
End-user interface
o Allows end user to interact with the data, connection to SQL
Entity names should...
o Be descriptive of the objects in the business environment o Use terminology that is familiar to the users
Schema
o Conceptual organization of the entire database as viewed by the database administrator
Unified Modeling Language (UML)
o Describes sets of diagrams and symbols to graphically model a system
Describe the collection of tables stored in the database
o Each table is independent from another o Rows in different tables are related based on common values in common attributes
Schema data definition language (DDL)
o Enables the database administrator to define the schema components
Data manipulation language (DML)
o Environment in which data can be managed and is used to work with the data in the database
SQL engine
o Executes all queries
Proper naming should..
o Facilitates communication between parties o Promotes self-documentation
Big Data aims to:
o Find new and better ways to manage large amounts of web and sensor-generated data o Provide high performance and scalability at a reasonable cost
Questions to identify the relationship type
o How many instances of B are related to one instance of A? o How many instances of A are related to one instance of B?
Extensible Markup Language (XML)
o Manages unstructured data for efficient and effective exchange of all data types
Subschema
o Portion of the database seen by the application programs that produce the desired information from the data within the database
Entity instance or entity occurrence
o Rows in the relational table
Extended relational data model (ERDM)
o Supports Object Oriented features and complex data representation
What do business rules allow the designer to do?
o Understand the nature, role, scope of data, and business processes o Develop appropriate relationship participation rules and constraints o Create an accurate data model
Entity relationship diagram (ERD)
o Uses graphic representations to model database components
Big Data characteristics
o Volume o Velocity o Variety
In the key-value model:
oEach row represents one attribute/value of one entity instance. oThe "key" column could represent any entity's attribute. oThe values in the "value" column could be of any data type and therefore it is generally assigned a long string data type
In the relational model:
oEach row represents one entity instance. oEach column represents one attribute of the entity. oThe values in a column are of the same data type.
Importance of Data Models
• Are a communication tool • Give an overall view of the database • Organize data for various users • Are an abstraction for the creation of good database
Object/Relational Database Management System (O/R DBMS)
• Based on ERDM, focuses on better data management
Business Rules
• Brief, precise, and unambiguous description of a policy, procedure, or principle • Enable defining the basic building blocks • Describe main and distinguishing characteristics of the data
Disadvantages of NoSQL
• Complex programming is required • There is no relationship support • There is no transaction integrity support • In terms of data consistency, it provides an eventually consistent model
Advantages of the Network Model
• Conceptual simplicity • Handles more relationship types • Data access is flexible • Data owner/member relationship promotes data integrity • Conformance to standards • Includes data definition language (DDL) and data manipulation language (DML)
Network Models
• Created to represent complex data relationships effectively • Improved database performance and imposed a database standard • Allows a record to have more than one parent • Depicts both one-to-many (1:M) and many-to-many (M:N) relationships
Hierarchical Models
• Developed to manage large amounts of data for complex manufacturing projects • Represented by an upside-down tree which contains segments (equivalent of a file system's record type) • Depicts a set of one-to-many (1:M) relationships
The External Model
• End users' view of the data environment • ER diagrams are used to represent the external views • External schema: Specific representation of an external view
What are reasons for Identifying and Documenting Business Rules
• Help standardize company's view of data • Communications tool between users and designers
Advantages of NoSQL
• High scalability, availability, and fault tolerance are provided • Uses low-cost commodity hardware • Supports Big Data • Key-value model improves storage efficiency
Disadvantages of the Entity Relationship Model
• Limited constraint representation • Limited relationship representation • No data manipulation language • Loss of information content occurs when attributes are removed from entities to avoid crowded displays
Translating Business Rules into Data Model Components involve..
• Nouns translate into entities • Verbs translate into relationships among entities • Relationships are bidirectional
The Physical Model
• Operates at lowest level of abstraction • Describes the way data are saved on storage media such as disks or tapes • Requires the definition of physical storage and data access methods • Relational model aimed at logical level oDoes not require physical-level details •Physical Independence
Relational Database Management System (RDBMS)
• Performs basic functions provided by the hierarchical and network DBMS systems • Makes the relational data model easier to understand and implement • Hides the complexities of the relational model from the user