Chapter 2 - Data Management - Foundations.

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Describe the basic features of the relational data model and discuss their importance to the end user and the designer.

Explanation: A relational database is a single data repository that provides both structural and data independence while maintaining conceptual simplicity.

What is a relationship, and what three types of relationships exist?

Explanation: A relationship is an association among (two or more) entities. Three types of relationships exist: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N or M:M.)

Why is an object said to have greater semantic content than an entity?

Explanation: An object has greater semantic content because it embodies both data and behavior. That is, the object contains, in addition to data, also the description of the operations that may be performed by the object.

What is the difference between an object and a class in the object oriented data model (OODM)?

Explanation: An object is an instance of a specific class. It is useful to point out that the object is a run-time concept, while the class is a more static description.

How do you translate business rules into data model components?

Explanation: As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. For example, the business rule "a customer may generate many invoices" contains two nouns (customer and invoice) and a verb ("generate") that associates them.

What is an ERDM, and what role does it play in the modern (production) database environment?

Explanation: The Extended Relational Data Model (ERDM) is the relational data model's response to the Object Oriented Data Model (OODM.) Most current RDBMSes support at least a few of the ERDM's extensions. For example, support for large binary objects (BLOBs) is now common.

Give an example of each of the three types of relationships. 1:1 1:M M:N

Explanation: 1:1- An academic department is chaired by one professor; a professor may chair only one academic department. 1:M- A customer may generate many invoices; each invoice is generated by one customer. M:N- An employee may have earned many degrees; a degree may have been earned by many employees.

What is a business rule, and what is its purpose in data modeling?

Explanation: A business rule is a brief, precise, and unambigous description of a policy, procedure, or principle within a specific organization's environment. In a sense, business rules are misnamed: they apply to any organization -- a business, a government unit, a religious group, or a research laboratory; large or small -- that stores and uses data to generate information.

Discuss the importance of data modeling.

Explanation: A data model is a relatively simple representation, usually graphical, of a more complex real world object event. The data model's main function is to help us understand the complexities of the real-world environment. The database designer uses data models to facilitate the interaction among designers, application programmers, and end users. In short, a good data model is a communications device that helps eliminate (or at least substantially reduce) discrepancies between the database design's components and the real world data environment. The development of data models, bolstered by powerful database design tools, has made it possible to substantially diminish the database design error potential. (Review Section 2.1 in detail.)

What is a relational diagram? Give an example.

Explanation: A relational diagram is a visual representation of the relational database's entities, the attributes within those entities, and the relationships between those entities. Therefore, it is easy to see what the entities represent and to see what types of relationships (1:1, 1:M, M:N) exist among the entities and how those relationships are implemented. An example of a relational diagram is found in the text's Figure 2.2.

Question : Explain how the entity relationship (ER) model helped produce a more structured relational database design environment.

Explanation: An entity relationship model, also known as an ERM, helps identify the database's main entities and their relationships. Because the ERM components are graphically represented, their role is more easily understood. Using the ER diagram, it's easy to map the ERM to the relational database model's tables and attributes. This mapping process uses a series of well-defined steps to generate all the required database structures.

What is Hadoop and what are its basic components?

Explanation: In order to create value from their previously unused Big Data stores, companies are using new Big Data technologies. These emerging technologies allow organizations to process massive data stores of multiple formats in cost-effective ways. Some of the most frequently used Big Data technologies are Hadoop and MapReduce. Hadoop is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google's work on distributed file systems and parallel processing and is currently supported by the Apache Software Foundation. Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce. Hadoop Distributed File System (HDFS) is a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, HDFS uses the write-once, read many model. This means that once the data is written, it cannot be modified. HDFS uses three types of nodes: a name node that stores all the metadata about the file system; a data node that stores fixed-size data blocks (that could be replicated to other data nodes) and a client node that acts as the interface between the user application and the HDFS. MapReduce is an open source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and nonstructured data. The MapReduce framework provides two main functions, Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set.

Describe the Big Data phenomenon.

Explanation: Over the last few years, a new wave of data has "emerged" to the limelight. Such data have always existed but did not receive the attention that is receiving today. These data are characterized for being high volume (petabyte size and beyond), high frequency (data are generated almost constantly), and mostly semi-structured. These data come from multiple and varied sources such as web site logs, web site posts in social sites, and machine generated information (GPS, sensors, etc.) Such data; have been accumulated over the years and companies are now awakening to the fact that it contains a lot of hidden information that could help the day-to-day business (such as browsing patterns, purchasing preferences, behavior patterns, etc.) The need to manage and leverage this data has triggered a phenomenon labeled "Big Data". Big Data refers to a movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while, at the same time, providing high performance and scalability at a reasonable cost.

What does the term "3 vs" refers to?

Explanation: The term "3 Vs" refers to the 3 basic characteristics of Big Data databases, they are: Volume: Refers to the amounts of data being stored. With the adoption and growth of the Internet and social media, companies have multiplied the ways to reach customers. Over the years, and with the benefit of technological advances, data for millions of e-transactions were being stored daily on company databases. Furthermore, organizations are using multiple technologies to interact with end users and those technologies are generating mountains of data. This ever-growing volume of data quickly reached petabytes in size and it's still growing. Velocity: Refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate information and insight. With the advent of the Internet and social media, business responses times have shrunk considerably. Organizations need not only to store large volumes of quickly accumulating data, but also need to process such data quickly. The velocity of data growth is also due to the increase in the number of different data streams from which data is being piped to the organization (via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS, and so on). Variety: Refers to the fact that the data being collected comes in multiple different data formats. A great portion of these data comes in formats not suitable to be handled by the typical operational databases based on the relational model.

Define and describe the basic characteristics of a NoSQL database.

NoSQL refers to a new generation of databases that address the very specific challenges of the "big data" era and have the following general characteristics: Not based on the relational model. These databases are generally based on a variation of the key-value data model rather than in the relational model, hence the NoSQL name. The key-value data model is based on a structure composed of two data elements: a key and a value; in which for every key there is a corresponding value (or a set of values). The key-value data model is also referred to as the attribute-value or associative data model. In the key-value data model, each row represents one attribute of one entity instance. The "key" column points to an attribute and the "value" column contains the actual value for the attribute. The data type of the "value" column is generally a long string to accommodate the variety of actual data types of the values that are placed in the column. Support distributed database architectures. One of the big advantages of NoSQL databases is that they generally use a distributed architecture. In fact, several of them (Cassandra, Big Table) are designed to use low cost commodity servers to form a complex network of distributed database nodes Provide high scalability, high availability and fault tolerance. NoSQL databases are designed to support the ability to add capacity (add database nodes to the distributed database) when the demand is high and to do it transparently and without downtime. Fault tolerant means that if one of the nodes in the distributed database fails, the database will keep operating as normal. Support very large amounts of sparse data. Because NoSQL databases use the key-value data model, they are suited to handle very high volumes of sparse data; that is for cases where the number of attributes is very large but the number of actual data instances is low. Geared toward performance rather than transaction consistency. One of the biggest problems of very large distributed databases is to enforce data consistency. Distributed databases automatically make copies of data elements at multiple nodes - to ensure high availability and fault tolerance. If the node with the requested data goes down, the request can be served from any other node with a copy of the data. However, what happen if the network goes down during a data update? In a relational database, transaction updates are guaranteed to be consistent or the transaction is rolled back. NoSQL databases sacrifice consistency in order to attain high levels of performance. NoSQL databases provide eventual consistency. Eventual consistency is a feature of NoSQL databases that indicates that data are not guaranteed to be consistent immediately after an update (across all copies of the data) but rather, that updates will propagate through the system and eventually all data copies will be consistent.

What is a table, and what role does it play in the relational model?

Strictly speaking, the relational data model bases data storage on relations. These relations are based on algebraic set theory.

What is sparse data? Give an example.

arse data refers to cases in which the number of attributes are very large, but the numbers but the actual number of distinct value instances is relatively small. For example, if you are modeling census data, you will have an entity called person. This entity person can have hundreds of attributes, some of those attributes would be first name, last name, degree, employer, income, veteran status, foreign born, etc. Although, there would be many millions of rows of data for each person, there will be many attributes that will be left blank, for example, not all persons will have a degree, an income or an employer. Even fewer persons will be veterans or foreign born. Every time that you have a data entity that has many columns but the data instances for the columns are very low (many empty attribute occurrences) it is said that you have sparse data.


Set pelajaran terkait

-individual accident and health insurance policy provisions

View Set

Chapter 1: Modern Network Security Threats - Part 1

View Set

Industrial Revolution Definitions

View Set

Ch 8: Sources of Capital for Entrepreneurs

View Set

The First Trimester: Review Questions

View Set