Chapter 12: Distributed Database Management Systems

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

transaction transparency

- allows a transaction to update data at more than one network site - either entirely completed or entirely aborted

distributed global schema

- database description - common database schema used by local TPs to translate user requests into subqueries (remote requests) that will be processed by different DPs

remote request

lets a single SQL statement access the data that are to be processed by a single remote database processor

database fragments

parts of a distributed database system

BASE

- a data consistency model in which data changes are not immediate but propagate slowly through the system until all replicas are eventually consistent - trade-off between consistency and availability - basically available, soft state, eventually consistent

distributed processing

- a database's logical processing is shared among two of more physically independent sites that are connected through a network - does not require a distributed database (can be a single-site DB)

multiple-site processing, multiple-site data

- a fully distributed DBMS with support for multiple data processors and transaction processors at multiple sites - MPMD - classified as either homogeneous or heterogeneous

types of distributed query costs

- access time (I/O) costs from multiple remote sites - communication costs with data transmission among nodes - CPU time costs from processing overhead of managing distributed transactions

single-site processing, single-side data

- all processing is done on a single host computer and all data are stored on the host computer's local disk system - can have multiple end user dumb terminals - SPSD

distribution transparency

- allows a distributed database to be managed as a single logical database - 3 levels: fragmentation, location, & local mapping transparency

performance transparency

- allows the system to perform as if it were a centralized DBMS - no system degradation due to use of network - ensures most cost-effective path to remote data - able to "scale out" transparently

data fragmentation

- allows you to break a single object into two or more segments, or fragments - each fragment can be store at any site over network

functions of a DDBMS

- application interfaces that interact w/users, app prgms, and other DBMSs w/in distributed DB - validation of syntax for data requests - transformation of complex requests into atomic data request components - query optimization over distributed database fragments - mapping to find data locations - I/O interface to/from permanent local storage - formatting data for user, app prgm - security at both local & remote DBs - backup & recovery - DB administration features - concurrency control to manage simultaneous data access - transaction management to move data from one consistent state to another

distributed transaction

- can reference several different local or remote DP sites - each single request only references one local DP site - transaction as a whole can reference multiple DP sites

Hadoop node types

- client node: makes requests to file system (reads/writes) - name node: contains the metadata for the file system - data node: store the actual data files

disadvantages of DDBMS

- complexity of management and control - technological difficulty - security - lack of standards - increased storage & infrastructure requirements - increased training costs - costs

remote transaction

- composed of several requests - accesses data at a single remote site

components of DDBMS

- computer workstations or remote devices (sites or nodes) - network hardware & software components in each workstation or device - communication media that carries data from one node to another - transaction processor - data processor

distributed database dictionary

- contains the description of the entire database as seen by the database administrator - DDD - aka distribute data catalog (DDC)

factors for DDBMS to resolve data requests

- data distribution (which fragment to access) - data replication (all copies kept consistent, providing replica transparency) - network & node availability (despite network latency or partitioning)

advantages of DDBMS

- data located near site of greatest demand - faster data access (using subset of data) - faster data processing (spread out work) - growth facilitation (add new sites easily) - improved communications - reduced operating costs - user-friendly interface - less danger of single-point failure - processor independence

Hadoop Distributed File System

- distributes data based on key assumptions of * high volume * write-once, read-many * streaming access * move computations to the data * fault tolerant - de facto standard for Big Data storage and processing - HDFS

DDBMS transparency features

- distribution transparency - transaction transparency - failure transparency

failure transparency

- ensures that the system will continue to operate in the event of a node or network failure - lost functions picked up by other nodes in network

write-ahead protocol

- forces the log entry to be written to permanent storage before the actual operation takes place - enables DO, UNDO, REDO operations can survive a system crash while being executed

distributed database management system

- governs the storage and processing of logically related data over interconnected computer systems in which both data and processing are distributed among several sites - DDBMS

two-phase commit protocol

- guarantees that if a portion of a transaction operation cannot be committed, all changes made at the other sites participating in the transaction will be undone to maintain a consistent database state - each DP maintains its own transaction log - 2PC

fragmentation transparency

- highest level of transparency - end user or programmer does not need to know that a database is partitioned - doesn't specify fragment names or locations

distributed request

- lets a single SQL statement reference data located at several different local or remote DP sites - therefore the transaction can access several sites - provides for fully distributed database processing

multiple-site processing, single-site data

- multiple processes run on different computers that share a single data repository - MPSD

problems of centralized DBMS

- performance degradation over growing number of remote sites, greater distances - high costs of maintaining, operating central mainframe - reliability problems created by dependence on single site - scalability problems imposed by single location - organizational rigidity imposed by database

data allocation

- process of deciding where to locate data - 3 strategies: centralized, partitioned or replicated

styles of replication

- push replication (after a data update, orig DP node sends changes to replicas to ensure they immediately update) - pull replication (after a data update, orig DP nodes sends messages to the replica to notify nodes to update -- nodes decide when update occurs)

heartbeat report

- sent by data node every 3 seconds - lets the name node know that the data node is still available

block report

- sent by data node every 6 hours - informs the name node of which blocks are on that data node

client/server architecture

- similar to that of the network file server except that all database processing is done at the server site, thus reducing network traffic - variation of MPSD

distributed database

- stores a logically related database over two or more physically independent sites - requires distributed processing

unreplicated database

- stores each database fragment at a single site - no duplicate data fragments

fully replicated database

- stores multiple copies of each database fragment at multiple sites - all database fragments are replicated

transaction processor

- the software component found in each computer or device that receives & processes application's remote & local data requests - TP - aka application processor (AP) - aka transaction manager (TM)

data processor

- the software component residing on each computer or device that stores and retrieves data located at the site - DP - aka data manager (DM)

DO-UNDO-REDO protocol

- used by the DP to roll transactions back and forward with the help of the system's transaction log entries

CAP Theorem

3 desired properties of a DDBMS - Consistency - Availability - Partition tolerance *impossible for a system to provide all 3 at the same time Dr. Eric Brewer, 2000

heterogeneity transparency

allows the integration of several different local DBMSs (relational, network and hierarchical) under a common, or global, schema

unique fragment

condition that indicates each row is unique, regardless of the fragment in which it is located

replicated data allocation

copies of one or more database fragments are stored at several sites

vertical fragmentation

data fragmentation strategy that refers to the division of a relation into attribute (column) subsets - equivalent to PROJECT stmt

horizontal fragmentation

data fragmentation strategy that refers to the division of a relation into subsets (fragments) of tuples (rows) - equivalent to SELECT stmt w/WHERE clause of single attribute

mixed fragmentation

data fragmentation strategy that refers to the division of a relation using a combination of horizontal and vertical strategies

partitioned data allocation

database is divided into two or more disjointed parts (fragments) and stored at two or more sites

centralized data allocation

entire database is stored at one site

local mapping transparency

exists when the end user or programmer must specify both the fragment names and their locations

location transparency

exists when the end user or programmer must specify the database fragment names but does not need to specify where those fragments are located

subordinates

in a two-phase commit protocol, the cohort nodes

coordinator

in a two-phase commit protocol, the role assigned to the node that initiates the transaction

heterogeneous DDBMS

integrates different types of DBMSs over a network, but all support the same data model

homogeneous DDBMS

integrates multiple instances of the same DBMS over a network, which can be on different platforms

partition key

one or more attributes in a table that determine the fragment in which a row will be stored - used in range partitioning

replica transparency

refers to the DDBMS's ability to hide multiple copies of data from the user

mutual consistency rule

requires all copies of data fragments be identical

distributed database design

same design principles as centralized DB, plus issues of: - data fragmentation (how to partition) - data replication (which fragments to replicate) - data allocation (where to locate fragments)

partially replicated database

stores multiple copies of some database fragments at multiple sites

fully heterogeneous DDBMS

supports different types of DBMSs, each one with a different data model, running under different computer systems

network latency

the delay imposed by the amount of time required for a data packet to make a round trip from point A to point B

network partitioning

the delay imposed when nodes become suddenly unavailable due to a network failure

data replication

the storage of data copies at multiple sites served by a computer network - subject to mutual consistency rule


संबंधित स्टडी सेट्स

Essentials of Networking Modules 7, 8, 9

View Set

Ch.44: Osmoregulation and Excretion

View Set

C++ GFG Constructor and Destructor + Function Overloading

View Set

2. A comenzar: Las primeras palabras

View Set