Ch 2 Data Models

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

some of thee most frequently used Big Data technologies are

- Hadoop - MapReduce - NoSQL

The use of external views that represent subsets of the database has some important advantages:

- It is easy to identify specific data required to support each business unit's operations. - It makes the designer's job easy by providing feedback about the model's adequacy. Specifically, the model can be checked to ensure that it supports all processes as defined by their external models, as well as all operational requirements and constraints. - It helps to ensure security constraints in the database design. Damaging an entire database is more difficult when each business unit works with only a subset of data. - It makes application program development much simpler.

a implementation-ready datamodel should contain at least the following components:

- a description of the data structure that will store the end-user data - a set of enforceable rules to guarantee the integrity of the data - a data manipulation methodology to support the real-world data transformations

from end-user perspective, any SQL-based relational database application involves three parts:

- end-user interface - a collection of tables stores in the database - SQL engine

basic building blosck of data models include

- entities - attributes - relationships - constraints

the ANSI/SPARC architecture defines three levels of data abstration

- external - conceptual - internal - physical (extended version now includes)

HDFS uses three types of nodes

- name nodes - data nodes - client nodes

database models use three types of relationships

- one to many - many to many - one to one

NoSQL (in Ch2) refers to the new generation of databses that address the specific challanges of the Big Data era and have the followwing general characteristics:

- they are not based on the relational model and SQL, hence the name NoSQL - they support highly distributed database architectures - provide high scalability, high availability, and fault tolerance - they support very large amounts of sparse data ( data with a large number of attributes but the actual number of data instances is low - are geared toward performance rather than transaction consistency

the basic characteristics of Big Data databases

- volume - velocity - variety

The conceptual model yields some important advantages:

First, it provides a bird'seye (macro level) view of the data environment that is relatively easy to understand Second, the conceptual model is independent of both software and hardware.

object/relational database management system (O/R DBMS)

a DBMS based on the extended relational model (ERDM). The ERDM, championed by many relational database reserchers, constitutes the relational model's response to the OODM. This model includes many of the object-oriented model's best features within an ingerently simplier relational database

attribute

a characteristic of an entity or object. an attribute has a name and a data type. equivalent to fields in file systems

a problem domain is

a clearly defined area within the real-world enviroment , with a well-defined scope and boundaries that will be systematically addressed

entity set

a collection of like entities

relational database management system (RDBMS)

a collection of programs that manages a relational database. the RDBMS software translates a user's logical request (queries) into commands that physically locate and retrieve the request data

class

a collection of similar objects with shared structure (attributes) and behavior (methods). A class encapsulates an object's data representation and a method's implementation objects that share similar characteristics are grouped in classes

hardware independence

a condition in which a model does not depend on the hardware used in the model's implementation. therefore, changes in the hardware will have no effect on the database design at the conceptual level

logical independence

a condition in which the internal model can be changed without affecting the conceptual model. )the internal model is hardware independent because it is uneffected by the computer on which the sofware is installed . Therefore, a change in storage deives or operating systems will not be affect the internal model)

physical independence

a condition in whihc the physical model can be changed without affecting the internal model

entity relationship (ER) model (ERM)

a data model that describes relationships (1:1, 1:M, and M:N) among entities at the conceptual leve with the help of ER diagrams

object-orientation data model (OODM)

a data model whose basic modeling structure is an object both data and its relationships are contained in a single structure known as an object

degrees of abstration

a database designer starts with an abstraact view of the overall data enviroment and adds details as the design comes closer to implementation

business rule

a description of a policy, procedure, or principle within an organization. Ex a pilot cannot be on duty for more than 10 hours during a 24-hour period, or a professor may teach up to four classes during a semester

entity relationship diagram (ERD)

a diagram that depicts an entity relationship model's entities, attributes, and relations

class diagram

a diagram used to represent data and thier relationship in UML object notation

relational diagram

a graphical representation of a relational database's entities, the attributes within those entities, and the relationships among the entities

Hadoop Distributed File System (HDFS)

a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds --> achieved through use of write-once, read many model

unified modeling language (UML)

a language based on object-oriented concepts that provides tools such as diagrams and symbols to graphically model a system

table (relation)

a logical construct perceived to be a two-dimensional structure composed of intersecting rows (entities) and columns (attributes) that represents an entity set in the relational model

schema

a logical grouping of database objects, such as tables, indexes, views, and queries, that are related to each other is the conceptual organization of the entire database as viewed by the database administrator

extensive markup language (XML)

a metalanguage used to represent and manipulate data elements. Unlike other markup languages, XML permits the manipulation of a document's data elements. XML facilitates the exchange of structured documents such as orders and invoices over the internet it emerged as the defacto standard for the efficient and effective exchange of structured, semistructured, and unstructured data

physical model

a model in which physical characteristics such as location, path, and format are described for the data. The physical model is both hardware and software (the DBMS and operating system) dependent

extended relational data model (ERDM)

a model that includes the object-oriented model's best features in an inherently simpler relational database structural enviroment. See extended entity relationship model

big data

a movement to find new and better ways to manage large amounts of web-generated data and derive business insights from it, while simultaneously providing high performance and scalability at a reasonable cost

NoSQL

a new generation of database management systems that is not based on the traditional relational database model represents a different way of approaching the storage and processing of data in a nonrelational way. provides distributed, fault-tolerant databases for processing nonstructured data

entity

a person, place, thing, concept, or event for which data can be collected and stored. See also attribute

software independence

a propert of any model or application that does not depend on the software used to implement it

internal schema

a representation of an internal model using the database conctructs supported by the chosen database

conceptual schema

a representation of the conceptual model, usually expressed graphically most widely used is the ER model

crow's foot notation

a representation of the entity relationshop diagram that uses a three-pronged symbol to represent the "many" sides of the relationship

data model

a representation, usually graphic, of a complex "real world" data structure. Data models are used in the database design phase of the Database Life Cycle

constraint

a restriction placed on data usually expressed in the form of rules. Ex student GPA must be between 0.00 and 4.00

entity instance (entity occurence)

a row in a relational table

logical design

a stage in the design phase that matches the conceptual design to the requirements of the selected DBMS and is, therefore, software dependent. Logical design is used to translate the conceptual design into the internal model for a selected database management system, such as DB2, SQL Server, Oracle, IMS, Infromix, Access, or Ingress generally refers to the task of creating a conceptual data model that could be implemented in any DBMS

object

an abstract representation of a real-world entity that has a unique identity, embedded properties, and the ability to interact with other objects and itself can be considered the equivalent to an ER model's entity

relationship

an association between entities

network model

an early data model that represented data as a collection of record types in 1:M relationships. allows a record to have more than one parent

hierarchical model

an early database model whose basic concepts and characteristics formed the basis for subsequent database development. This model is based on an upside-down tree structure in which each record is called a segment. The top record is the root segment. Each segment has a1:M relationship to the segment directly below it

MapReduce

an open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores by processing data among thousands of nodes parallel

many-to-many (M:N or *..*) relationship

association among two or more entities in which one occurrence of an entity is associated with many occurrences of the related entity and one occurrence of the related entity is associated with many occurrences of the entity

one-to-one (1:1 of 1..1) relationship

associations among two or more entities that areused by data models. in a 1:1 relationship, one entity instance is associated with only one instance of the related entity

one-to-many (1:M or 1..*) relationship

associsations among two or more entities that are used by data model. In a 1:M relationship, one entitiy instance is associated with many instances of the realted entity

the sucess of the O/R DBMS can be attributed to the model's

conceptual simplicity, data integrity, easy-to-use query language, high transaction performance, high availability, security, scalability, and expandability most relational DB products can be classified as object/relational ex OLTP and OLAP DB applications

object-oriented database management system (OODBMS)

data management software used to manage data in an object-oriented database model

the phyical model is

dependent on the DBMS, methods of assessing files, and types of hardware storage devices supported by the operating system

attributes in OODM models

describe the properties of an object

relational model

developed by E.F Codd of IBM in 1970, the relational model is based on mathematical set theory and represents data as independent relations. Each relation (table) is conceptually represented as a two-dimensional structure of intersecting rows and columns. The relations are related to each other through the sharing of common entity characteristics (values in columns)

internal model (software dependent, hardware independent)

in database modeling, a level of dat abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internalmodel is the representation of a databse as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characterisitic and contraints to those of the selected implementation model

segment

in the hierarchical data model, the equivalent of a file system's record type

method

in the object-oriented data model, a named set of instructions to perform an action. Methods represent real-world actions equivalent of procedures in traditional programming languages

tuple

in the relational model, a table row

Hadoop

is a Java-based, open source, high-speed, fault-tolerant distributed storage and computational framework. It uses low-cost hardware to create clusters of thousands of computer nodes to store and process data is not a database or nor a data model \it is a distributed file storing and processing model

inheritance

is the ability of an object within the class hierarchy to inherit the attributes and methods of the classes above it in the object-oriented data model, the ability of an object to inherit the data structure and methods of the classes above it in the class hierarchy

OO DBMS is popular in

niche markets such as computer-aided drawing/computer-aided manufacturing (CAD/ CAM), geographic information systems (GIS), telecommunications, and multimedia, which require support for more complex objects

velocity

refers not only to the speed with which data grows but also to the need to process this data quickly in order to generate information and insight

volume

refers to the amounts of data stored

variety

refers to the fact that the data being collected comes in multiple different data formats

Chen notation

see entitiy relationship (er) model

external model

the application programmer's view of the data enviroment. given its business focus, an external model works with a data subset of the globl database schema

sematic data model

the first of a series of data models that models both data and their relationships in a single structure known as an object

data modeling

the first step in the database design, the process of creating a specific data model for a determined problem domain

American National Standards Institute (ANSI)

the group that accepted the DBTG recommendations and augmented database standards in 1975 through its SPARC committee Standards Planning and Requirement Committee (SPARC) defined a framework for data modeling based on the degree of data abstraction

data definition language (DDL)

the language that allows a database administrator to define the database structure, schema, and subschema

class hierarchy

the organization of classes in a hierarchical tree in which each parent is a superclass and each child class is a subclass

conceptual model

the output of the conceptual design process. The conceptual model provides a global view of an entire database and describes the main data objects, avoiding details

subschema

the portion of the database that interacts with application programs

connectivity

the relationship between entities. classification include 1:1, 1:M, and M:N

data manipulation language (DML)

the set of commands that allows an end-user to manipulate the data in the database, such as SELECT, INSERT, UPDATE, DELETE, COMMIT, and ROLLBACK defines the environment in which data can be managed and is used to work with the data in the database

class diagram notation which is part of the Unified Modeling Language (UML)

the set of symbols used in the creation of class diagrams

external schema

the specific representaion of an external view; the end user's view of the data enviroment

client node

used in HDFS. the client mode acts as the interface between the user application and the HDFS.

name node

used in the HDFM, the name node stores all the metadata about the file system

data node

used in the HDFS. the data nodes stores fixed-size data blocks (that could be replicated to other data nodes)

The process of identifying and documenting business rules is essential to database design for several reasons:

• It helps to standardize the company's view of data. • It can be a communication tool between users and designers. • It allows the designer to understand the nature, role, and scope of the data. • It allows the designer to understand business processes. • It allows the designer to develop appropriate relationship participation rules and constraints and to create an accurate data model.


संबंधित स्टडी सेट्स

Ionic Compounds: SULPHATE AND SULPHITE IONS ( So4 2- & So3 2-)

View Set

11. Bauhaus, 1919 - 1933 (Gropius, Meyer, van Rohe, Paul Klee, Vassilij Kandinsky, Moholy-Nagy, Farkas Molnár, etc.)

View Set

Structure of the eye, process of transduction

View Set

Chapter 14, Learning Activity 14-2

View Set