CH 2
semantic data model
. The OODM is said to be a .. because semantic indicates meaning
external schema
Because data are being modeled, ER diagrams will be used to represent the external views. A specifc representation of an external view is known as an
object/relational database management system (O/R DBMS).
That's why a DBMS based on the ERDM is often described as
logical independence.
When you can change the internal model without affecting the conceptual model, you have
physical independence
When you can change the physical model without affecting the internal model, you have
Hadoop Distributed File System (HDFS)
is a highly distributed, fault-tolerant fle storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, HDFS uses the writeonce, read many model. This means that once the data is written, it cannot be modifed. HDFS uses three types of nodes: a name node that stores all the metadata about the fle system, a data node that stores fxed-size data blocks (that could be replicated to other data nodes), and a client node that acts as the interface between the user application and the HDFS.
data model
is a relatively simple representation, usually graphical, of more complex real-world data structures. In general terms, a model is an abstraction of a more complex real-world object or event.
relational diagram
is a representation of the relational database's entities, the attributes within those entities, and the relationships between those entities
object-oriented database management system (OODBMS).
the OODM is the basis for the
American National Standards Institute (ANSI)
) defned a framework for data modeling based on degrees of data abstraction
conceptual schema
, it is the basis for the identifcation and high-level description of the main data objects (avoiding any database model-specifc details).
3 V's volume, velocity, and variety
1. Volume refers to the amounts of data being stored. With the adoption and growth of the Internet and social media, companies have multiplied the ways to reach customers. Over the years, and with the beneft of technological advances, data for millions of e-transactions were being stored daily on company databases. Furthermore, organizations are using multiple technologies to interact with end users and those technologies are generating mountains of data. This ever-growing volume of data quickly reached petabytes in size and it's still growing. 2. Velocity refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate information and insight. With the advent of the Internet and social media, business response times have shrunk considerably. Organizations need not only to store large volumes of quickly accumulating data, but also need to process such data quickly. The velocity of data growth is also due to the increase in the number of different data streams from which data is being piped to the organization (via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS, and so on). 3. Variety refers to the fact that the data being collected comes in multiple different data formats. A great portion of these data comes in formats not suitable to be handled by the typical operational databases based on the relational model.
One-to-many (1:M or 1..*) relationship
A painter creates many different paintings, but each is painted by only one painter. Thus, the painter (the "one") is related to the paintings (the "many"). Therefore, database designers label the relationship "PAINTER paints PAINTING" as 1:M. Note that entity names are often capitalized as a convention, so they are easily identifed. Similarly, a customer (the "one") may generate many invoices, but each invoice (the "many") is generated by only a single customer. The "CUSTOMER generates INVOICE" relationship would also be labeled 1:M.
One-to-one (1:1 or 1..1) relationship
A retail company's management structure may require that each of its stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a single store. Therefore, the relationship "EMPLOYEE manages STORE" is labeled 1:1.
Many-to-many (M:N or *..*) relationship
An employee may learn many job skills, and each job skill may be learned by many employees. Database designers label the relationship "EMPLOYEE learns SKILL" as M:N. Similarly, a student can take many classes and each class can be taken by many students, thus yielding the M:N label for the relationship expressed by "STUDENT takes CLASS.
entity relationship (ER) model, or ERM
Because it is easier to examine structures graphically than to describe them in text, database designers prefer to use a graphical tool in which entities and their relationships are pictured.
tuple.
Each row in a relation Each column represents an attribute. The relational model also describes a precise set of data manipulation constructs based on advanced mathematical concepts.
Unifed Modeling Language (UML)
Object-oriented data models are typically depicted using... class diagrams. UML is a language based on OO concepts that describes a set of diagrams and symbols you can use to graphically model a system.
connectivity
The ER model uses the term connectivity to label the relationship types
relational database management system (RDBMS)
The RDBMS performs the same basic functions provided by the hierarchical and network DBMS systems, in addition to a host of other functions that make the relational data model easier to understand and implemen
entity instance or entity occurrence
Usually, when applying the ERD to the relational model, an entity is mapped to a relational table.
class diagrams
are used to represent data and their relationships within the larger UML object-oriented system's modeling language
relation (table)
as a matrix composed of intersecting rows and columns. foundation is a mathematical concept known as a relation
object-oriented data model (OODM)
both data and their relationships are contained in a single structure known as an object.
key-value
data model is based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The key-value data model is also referred to as the attribute-value or associative data model.
data manipulation language (DML)
defnes the environment in which data can be managed and is used to work with the data in the database.
subschema
defnes the portion of the database "seen" by the application programs that actually produce the desired information from the data within the database.
internal schema
depicts a specifc representation of an internal model, using the database constructs supported by the chosen database.
relationship
describes an association among entities. For example, a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent. Data models use three types of relationships: one-to-many, many-to-many, and one-to-one. Database designers usually use the shorthand notations 1:M or 1..*, M:N or *..*, and 1:1 or 1..1, respectively
Extensible Markup Language (XML)
emerged as the de facto standard for the effcient and effective exchange of structured, semistructured, and unstructured data. Organizations that used XML data soon realized that they needed to manage large amounts of unstructured data such as word-processing documents, webpages, emails, and diagrams.
data defnition language (DDL)
enables the database administrator to defne the schema components
Hadoop
is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google's work on distributed fle systems and parallel processing and is currently supported by the Apache Software Foundation.7 Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce
business rule
is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specifc organization. In a sense, business rules are misnamed: they apply to any organization, large or small—a business, a government unit, a religious group, or a research laboratory—that stores and uses data to generate information
attribute
is a characteristic of an entity. For example, a CUSTOMER entity would be described by attributes such as customer last name, customer frst name, customer phone number, customer address, and customer credit limit. Attributes are the equivalent of felds in fle systems
class
is a collection of similar objects with shared structure (attributes) and behavior (methods). In a general sense, a class resembles the ER model's entity set. However, a class is different from an entity set in that it contains a set of procedures known as methods.
NoSQL
is a large-scale distributed database system that stores structured and unstructured data in effcient ways. NoSQL databases are discussed in more detail later in this section.
entity
is a person, place, thing, or event about which data will be collected and stored. An entity represents a particular type of object in the real world, which means an entity is "distinguishable"—that is, each entity occurrence is unique and distinct. For example, a CUSTOMER entity would have many distinguishable customer occurrences, such as John Smith, Pedro Dinamita, and Tom Strickland. Entities may be physical objects, such as customers or products, but entities may also be abstractions, such as flight routes or musical concerts.
constraint
is a restriction placed on the data. Constraints are important because they help to ensure data integrity. Constraints are normally expressed in the form of rules: • An employee's salary must have values that are between 6,000 and 350,000. • A student's GPA must be between 0.00 and 4.00. • Each class must have one and only one teacher.
MapReduce
is an open source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and nonstructured data. The MapReduce framework provides two main functions, Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set.
Inheritance
is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it. For example, two classes, CUSTOMER and EMPLOYEE, can be created as subclasses from the class PERSON. In this case, CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON
schema
is the conceptual organization of the entire database as viewed by the database administrator
external mode
is the end users' view of the data environment. The term end users refers to people who use the application programs to manipulate the data and generate information.
segment
is the equivalent of a fle system's record type. Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it, which is called the child. The hierarchical model depicts a set of one-to-many (1:M) relationships between a parent and its children segments. (Each parent can have many children, but each child has only one parent.)
internal model
is the representation of the database as "seen" by the DBMS. In other words, the internal model requires the designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.
Software independence
means that the model does not depend on the DBMS software used to implement the model
. Hardware independence
means that the model does not depend on the hardware used in the implementation of the model. Therefore, changes in either the hardware or the DBMS software will have no effect on the database design at the conceptual level.
physical model
operates at the lowest level of abstraction, describing the way data are saved on storage media such as magnetic, solid state, or optical media.
conceptual model
represents a global view of the entire database by the entire organization
method
represents a real-world action such as fnding a selected PERSON's name, changing a PERSON's name, or printing a PERSON's address. In other words, methods are the equivalent of procedures in traditional programming languages. In OO terms, methods defne an object's behavior.
class hierarchy
resembles an upside-down tree in which each class has only one parent. For example, the CUSTOMER class and the EMPLOYEE class share a parent PERSON class. (Note the similarity to the hierarchical data model in this respect.)
sparse data
that is, for cases in which the number of attributes is very large but the number of actual data instances is low.
Data modeling
the first step in designing a database, refers to the process of creating a specifc data model for a determined problem domain. (A problem domain is a clearly defned area within the real-world environment, with a well-defned scope and boundaries that will be systematically addressed.)
extended relational data model (ERDM)
the relational model's main vendors evolved the model further and created the.. The ERDM adds many of the OO model's features within the inherently simpler relational database structure. The ERDM gave birth to a new generation of relational databases that support OO features such as objects (encapsulated data and methods), extensible data types based on classes, and inheritance.
network model
was created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard. In the network model, the user perceives the network database as a collection of records in 1:M relationships
hierarchical model
was developed in the 1960s to manage large amounts of data for complex manufacturing projects, such as the Apollo rocket that landed on the moon in 1969. The model's basic logical structure is represented by an upside-down tree. The hierarchical structure contains levels, or segments
relational model
was introduced in 1970 by E. F. Codd of IBM in his landmark paper "A Relational Model of Data for Large Shared Databanks" The relational model represented a major breakthrough for both users and designers. To use an analogy, the relational model produced an "automatic transmission" database to replace the "standard transmission" databases that preceded it. Its conceptual simplicity set the stage for a genuine database revolution.
entity relationship diagram (ERD)
which uses graphical representations to model database components.