CH 2

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

semantic data model

. The OODM is said to be a .. because semantic indicates meaning

external schema

Because data are being modeled, ER diagrams will be used to represent the external views. A specifc representation of an external view is known as an

object/relational database management system (O/R DBMS).

That's why a DBMS based on the ERDM is often described as

logical independence.

When you can change the internal model without affecting the conceptual model, you have

physical independence

When you can change the physical model without affecting the internal model, you have

Hadoop Distributed File System (HDFS)

is a highly distributed, fault-tolerant fle storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, HDFS uses the writeonce, read many model. This means that once the data is written, it cannot be modifed. HDFS uses three types of nodes: a name node that stores all the metadata about the fle system, a data node that stores fxed-size data blocks (that could be replicated to other data nodes), and a client node that acts as the interface between the user application and the HDFS.

data model

is a relatively simple representation, usually graphical, of more complex real-world data structures. In general terms, a model is an abstraction of a more complex real-world object or event.

relational diagram

is a representation of the relational database's entities, the attributes within those entities, and the relationships between those entities

object-oriented database management system (OODBMS).

the OODM is the basis for the

American National Standards Institute (ANSI)

) defned a framework for data modeling based on degrees of data abstraction

conceptual schema

, it is the basis for the identifcation and high-level description of the main data objects (avoiding any database model-specifc details).

3 V's volume, velocity, and variety

1. Volume refers to the amounts of data being stored. With the adoption and growth of the Internet and social media, companies have multiplied the ways to reach customers. Over the years, and with the beneft of technological advances, data for millions of e-transactions were being stored daily on company databases. Furthermore, organizations are using multiple technologies to interact with end users and those technologies are generating mountains of data. This ever-growing volume of data quickly reached petabytes in size and it's still growing. 2. Velocity refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate information and insight. With the advent of the Internet and social media, business response times have shrunk considerably. Organizations need not only to store large volumes of quickly accumulating data, but also need to process such data quickly. The velocity of data growth is also due to the increase in the number of different data streams from which data is being piped to the organization (via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS, and so on). 3. Variety refers to the fact that the data being collected comes in multiple different data formats. A great portion of these data comes in formats not suitable to be handled by the typical operational databases based on the relational model.

One-to-many (1:M or 1..*) relationship

A painter creates many different paintings, but each is painted by only one painter. Thus, the painter (the "one") is related to the paintings (the "many"). Therefore, database designers label the relationship "PAINTER paints PAINTING" as 1:M. Note that entity names are often capitalized as a convention, so they are easily identifed. Similarly, a customer (the "one") may generate many invoices, but each invoice (the "many") is generated by only a single customer. The "CUSTOMER generates INVOICE" relationship would also be labeled 1:M.

One-to-one (1:1 or 1..1) relationship

A retail company's management structure may require that each of its stores be managed by a single employee. In turn, each store manager, who is an employee, manages only a single store. Therefore, the relationship "EMPLOYEE manages STORE" is labeled 1:1.

Many-to-many (M:N or *..*) relationship

An employee may learn many job skills, and each job skill may be learned by many employees. Database designers label the relationship "EMPLOYEE learns SKILL" as M:N. Similarly, a student can take many classes and each class can be taken by many students, thus yielding the M:N label for the relationship expressed by "STUDENT takes CLASS.

entity relationship (ER) model, or ERM

Because it is easier to examine structures graphically than to describe them in text, database designers prefer to use a graphical tool in which entities and their relationships are pictured.

tuple.

Each row in a relation Each column represents an attribute. The relational model also describes a precise set of data manipulation constructs based on advanced mathematical concepts.

Unifed Modeling Language (UML)

Object-oriented data models are typically depicted using... class diagrams. UML is a language based on OO concepts that describes a set of diagrams and symbols you can use to graphically model a system.

connectivity

The ER model uses the term connectivity to label the relationship types

relational database management system (RDBMS)

The RDBMS performs the same basic functions provided by the hierarchical and network DBMS systems, in addition to a host of other functions that make the relational data model easier to understand and implemen

entity instance or entity occurrence

Usually, when applying the ERD to the relational model, an entity is mapped to a relational table.

class diagrams

are used to represent data and their relationships within the larger UML object-oriented system's modeling language

relation (table)

as a matrix composed of intersecting rows and columns. foundation is a mathematical concept known as a relation

object-oriented data model (OODM)

both data and their relationships are contained in a single structure known as an object.

key-value

data model is based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values. The key-value data model is also referred to as the attribute-value or associative data model.

data manipulation language (DML)

defnes the environment in which data can be managed and is used to work with the data in the database.

subschema

defnes the portion of the database "seen" by the application programs that actually produce the desired information from the data within the database.

internal schema

depicts a specifc representation of an internal model, using the database constructs supported by the chosen database.

relationship

describes an association among entities. For example, a relationship exists between customers and agents that can be described as follows: an agent can serve many customers, and each customer may be served by one agent. Data models use three types of relationships: one-to-many, many-to-many, and one-to-one. Database designers usually use the shorthand notations 1:M or 1..*, M:N or *..*, and 1:1 or 1..1, respectively

Extensible Markup Language (XML)

emerged as the de facto standard for the effcient and effective exchange of structured, semistructured, and unstructured data. Organizations that used XML data soon realized that they needed to manage large amounts of unstructured data such as word-processing documents, webpages, emails, and diagrams.

data defnition language (DDL)

enables the database administrator to defne the schema components

Hadoop

is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google's work on distributed fle systems and parallel processing and is currently supported by the Apache Software Foundation.7 Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce

business rule

is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specifc organization. In a sense, business rules are misnamed: they apply to any organization, large or small—a business, a government unit, a religious group, or a research laboratory—that stores and uses data to generate information

attribute

is a characteristic of an entity. For example, a CUSTOMER entity would be described by attributes such as customer last name, customer frst name, customer phone number, customer address, and customer credit limit. Attributes are the equivalent of felds in fle systems

class

is a collection of similar objects with shared structure (attributes) and behavior (methods). In a general sense, a class resembles the ER model's entity set. However, a class is different from an entity set in that it contains a set of procedures known as methods.

NoSQL

is a large-scale distributed database system that stores structured and unstructured data in effcient ways. NoSQL databases are discussed in more detail later in this section.

entity

is a person, place, thing, or event about which data will be collected and stored. An entity represents a particular type of object in the real world, which means an entity is "distinguishable"—that is, each entity occurrence is unique and distinct. For example, a CUSTOMER entity would have many distinguishable customer occurrences, such as John Smith, Pedro Dinamita, and Tom Strickland. Entities may be physical objects, such as customers or products, but entities may also be abstractions, such as flight routes or musical concerts.

constraint

is a restriction placed on the data. Constraints are important because they help to ensure data integrity. Constraints are normally expressed in the form of rules: • An employee's salary must have values that are between 6,000 and 350,000. • A student's GPA must be between 0.00 and 4.00. • Each class must have one and only one teacher.

MapReduce

is an open source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and nonstructured data. The MapReduce framework provides two main functions, Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set.

Inheritance

is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it. For example, two classes, CUSTOMER and EMPLOYEE, can be created as subclasses from the class PERSON. In this case, CUSTOMER and EMPLOYEE will inherit all attributes and methods from PERSON

schema

is the conceptual organization of the entire database as viewed by the database administrator

external mode

is the end users' view of the data environment. The term end users refers to people who use the application programs to manipulate the data and generate information.

segment

is the equivalent of a fle system's record type. Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it, which is called the child. The hierarchical model depicts a set of one-to-many (1:M) relationships between a parent and its children segments. (Each parent can have many children, but each child has only one parent.)

internal model

is the representation of the database as "seen" by the DBMS. In other words, the internal model requires the designer to match the conceptual model's characteristics and constraints to those of the selected implementation model.

Software independence

means that the model does not depend on the DBMS software used to implement the model

. Hardware independence

means that the model does not depend on the hardware used in the implementation of the model. Therefore, changes in either the hardware or the DBMS software will have no effect on the database design at the conceptual level.

physical model

operates at the lowest level of abstraction, describing the way data are saved on storage media such as magnetic, solid state, or optical media.

conceptual model

represents a global view of the entire database by the entire organization

method

represents a real-world action such as fnding a selected PERSON's name, changing a PERSON's name, or printing a PERSON's address. In other words, methods are the equivalent of procedures in traditional programming languages. In OO terms, methods defne an object's behavior.

class hierarchy

resembles an upside-down tree in which each class has only one parent. For example, the CUSTOMER class and the EMPLOYEE class share a parent PERSON class. (Note the similarity to the hierarchical data model in this respect.)

sparse data

that is, for cases in which the number of attributes is very large but the number of actual data instances is low.

Data modeling

the first step in designing a database, refers to the process of creating a specifc data model for a determined problem domain. (A problem domain is a clearly defned area within the real-world environment, with a well-defned scope and boundaries that will be systematically addressed.)

extended relational data model (ERDM)

the relational model's main vendors evolved the model further and created the.. The ERDM adds many of the OO model's features within the inherently simpler relational database structure. The ERDM gave birth to a new generation of relational databases that support OO features such as objects (encapsulated data and methods), extensible data types based on classes, and inheritance.

network model

was created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard. In the network model, the user perceives the network database as a collection of records in 1:M relationships

hierarchical model

was developed in the 1960s to manage large amounts of data for complex manufacturing projects, such as the Apollo rocket that landed on the moon in 1969. The model's basic logical structure is represented by an upside-down tree. The hierarchical structure contains levels, or segments

relational model

was introduced in 1970 by E. F. Codd of IBM in his landmark paper "A Relational Model of Data for Large Shared Databanks" The relational model represented a major breakthrough for both users and designers. To use an analogy, the relational model produced an "automatic transmission" database to replace the "standard transmission" databases that preceded it. Its conceptual simplicity set the stage for a genuine database revolution.

entity relationship diagram (ERD)

which uses graphical representations to model database components.


Set pelajaran terkait

Ch. 3 Entrepreneurship Study Packet

View Set

Ch 15 Differential Reinforcement

View Set

Equity securities chapter (exam review)

View Set

Straighterline Personal Finance Topic 1 Quiz

View Set

Role of Whips and the Whipping system

View Set