Chapter 3 - Database System and Big Data - Principles of Information Technology
How many bits make a byte? What does a byte represent?
8 bits compose a byte. Each byte represents a character.
Example of each part of the hierarchy of data:
8 bits which compose a character - can simply be a letter in a program or database. Field(s) - a name of some sort (like a last name field). Record(s) - a collection of related data fields - the employee's last name, first name, number, address, hire date. File(s) - a collection of related records - Personnel file (the names, numbers, and such for different people). Database - different files (personnel file, department file, etc.).
Back-end application
A Back-end application is an application that indirectly interacts with the people. It directly interacts with other programs or applications (and those directly interact with the people).
What are the five things that a DBA does?
A DBA does five extremely important tasks: Communicated with users to determine or define their data needs Uses programming languages to craft a set of databases to meet those data needs Tests and evaluates databases Makes adjustments and implements changes to the databases to improve their performance Ensures security and protection of the data from unauthorized access.
What is required for the database approach to data management? - very important.
A DBMS - Database Management System is required for the database approach to data management.
What is a Data Lifecycle Management?
A DLM is a policy-based approach to managing the follow of an enterprise's data.
What is data manipulation language?
A DML is a specific language that comes/is provided with the database management system.
NoSQL database provides...
A NoSQL database provides a means to store and retrieve data that is modeled in some way other than the two dimensional tabular designs (relations) associated with relational databases.
What is a bit - what does it represent?
A bit, or a binary digit, represents a circuit that is either on or off.
Database definition (not the organized collection of data one).
A collection of integrated and related files.
Data administrator definition. What does the work include?
A data administrator is a nontechnical position that is responsible for defining and implementing consistent principles for a variety of data issues. The work includes data standards and data definitions that apply across all databases in organization.
Data dictionary definition
A data dictionary is a description of all of the data used in the database. It can include information on how the data flows, how records are organized, and other requirements (like data processing).
Data lake definition.
A data lake takes a "store everything" approach to big data, saving all all data in its raw and unaltered form.
Data mart definition
A data mart is a subset of a data warehouse that can be typically found in small to medium-sized businesses and departments in large companies. (There are multiple data marts normally associated with each department).
Data model definition
A data model is a diagram of data entities and their relationships.
What is a data steward?
A data steward is an individual who is responsible for the management of critical data elements.
What is a data warehouse?
A data warehouse is a large database that collects business information from many sources in the enterprise to support management decision making.
A database administrator is a skilled and...
A database administrator is a skilled and trained IS professional.
What is a database?
A database is an organized collection of data.
A database management system is a group of blanks.
A database management system is a group of programs.
Domain
A domain is a range of allowable values for a data attribute.
What is a field?
A field is a name, number, or combination of characters that describes an aspect of a business object or activity.
Primary key
A field or set of fields that uniquely identifies the record.
File definition
A file is a collection of related records.
Front-end application
A front-end application is one that directly interacts with its users/people.
Record definition
A record is a collection of related data fields.
What is a relational (database) model?
A relational model is a simple but highly useful way to organize data into collections two-dimensional tables called relations.
A schema can be either two things..
A schema can be either two things: It can be part of the database Or It can be within a separate schema file.
Schema definition
A schema is a description of the entire database: How much data will it be able to hold? What type of data will it hold? How will the database management system function? How quickly will a user be able to generate reports with the use of this database?
What is an attribute?
An attribute is a characteristic of an entity.
What is an entity?
An entity is a person, place, or thing for which data is collected, stored, and maintained.
What is an example of DaaS
An example of DaaS can be Amazon Relational Database Server (Amazon RDS).
What is In-memory Database?
An in-memory database is a DBMS that stores the entire database within random Access Memory (RAM).
What could an organization NOT do without successful collecting and maintaining data - without data and the ability to process it?
An organization without data or without the ability to process the data, will not be able to carry out most business activities.
What is another term for Primary Key?
Another term for Primary Key is the key field.
What's another term for Data Lake? When is the data extracted, transformed, and loaded?
Another term for a data lake may be an enterprise hub. Data from a data lake goes through the ETL process when the users know exactly what data they're looking for, and have already identified what the data itself will be used for.
What is another term for bit?
Another word for bit is a binary digit.
Attributes are essentially blanks Entities are essentially blanks
Attributes are essentially fields. Entities are essentially Records.
What does the Big Data Lifecycle diagram look like?
Big Data Lifecycle is in the middle of the diagram: Around it there are certain steps associated with the Lifecycle (eight steps in total) - they move clockwise. 1. Define data needs 2. Evaluate different or alternate sources 3. Acquire the data 4. Store that data 5. Publish the data descriptions 6. Access and use data 7. Evaluate the data. 8. Archive or discared data.
Machine log data
Business Process logs Application logs
Joining
Combining two or more tables
Linking
Combining two or more tables through common data attributes to forma new table with only unique data attributes.
Concurrency control deals with...
Concurrency control deals with the situation when two or more users need to access the same record at the same time.
Cost versus accuracy. There is a blank between these. What does this mean?
Cost versus accuracy (data cleansing). There is a tradeoff between these, with an increasing opportunity cost. In order to increase data accuracy, you have to pay more (and vice versa). The more accurate you want your data to be, the more you will have to pay(the payment increases; it is not linear).
Creating and implementing the right relational database management system will allow for?
Creating and implementing the right relational database management system will ensure that the database itself will support both business activities and goals.
What does DAMA stand for?
DAMA stands for the Data Management Association International.
DBA stands for what?
DBA stands for Database Administrator.
What does DBMS stand for?
DBMS stands for Database Management System.
What two types of applications can DBMSs act as?
DBMSs can act as front-end or back-end applications.
What is Data Definition Language?
DDL is a collection of instructions and commands used to define and describe data and relationships in a specific database.
What does DDL stand for?
DDL stands for Data Definition Language.
What does DLM stand for?
DLM stands for Data Lifecycle Management.
What does DML stand for?
DML stands for Data manipulation language.
What does DaaS stand for?
DaaS stands for Database as a Service.
Data cleansing is also known as...?
Data cleansing is also known as data cleaning or data scrubbing.
What is data cleansing?
Data cleansing is the process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, and/or irrelevant records that reside in a database.
Data consists of...
Data consists of raw facts: Alphanumeric data - numbers, letters, and other characters. Image data - graphic images or pictures Audio data - sounds, noises, or tones Video data - moving images or pictures.
Data governance requires business blank and active blank
Data governance requires business leaderships and active participation
Data governance definition
Data governance simply defines the roles, responsibilities, and processes that are required to make sure that the data can be trusted by all in the entire organization.
Data management definition
Data management is an integrated set of functions that define the processes by which data is collected, certified for use, stored, secured, and process to ensure that the accessibility, the reliability, and the timeliness of data meets the needs of data users within an organization.
DML allows for?
Data manipulation language allows users to access and modify data, generate reports, or create queries.
A DBMS can produce a wide variety of documents, reports, and other output that can help the organizations achieve their goals. True or False.
Definitely true; a DBMS can most certainly generate a wide variety of documents, reports, and other outputs that can help the organizations achieve their goals.
Different tables can be connected to one another, and can be used effectively, if they have at least one common what?
Different tables can be connected to one another, and can be used effectively in order to find particular information, if they share one common data attribute. For example, you may be tying to find a name of a particular individual, but all you know is the department number (with this number, you can move between tables with other information until you find the name).
What are entity-relationship diagrams?
ER (entity-relationship) diagrams are data models that use basic graphical symbols to show the organization of and relationships that exist between data.
What does ER diagram stand for?
ER diagram stands for Entity-Relationship Diagram.
Data from business applications
ERP CRM
Traditional approach to data management.
Each distinct operational system used data files dedicated to that system.
Projecting
Eliminating columns in a table
Enterprise data modeling
Enterprise data modeling is modeling that is done at the level of the entire enterprise.
Extremely large and complex data collections are known as...
Extremely large and complex data collections are known as big data.
Data cleansing and data validation are the same. True or False. What is data validation?
False; Data cleansing is not the same as data validation. Data validation involves the identification of "bad data" and its rejection at the time of data entry.
In a database approach to data management, multiple information systems do not share a pool of related data. True or False?
False; in a database approach to data management, multiple information systems DO INDEED share a pool of related data.
Traditional data management software, hardware, processors, and analysis processes are capable of dealing with big data (extremely large and complex collections of data). True or False.
False; traditional data management software, hardware, processors, and analysis processes are INCAPABLE of dealing with big data (extremely large and complex collections of data).
What does the Hadoop environment look like on figure 3.24?
Figure 3.24 demonstrates the Hadoop environment. In the middle, the Hadoop software framework is operating in a cluster of hundreds of servers. To the left, data from many different sources (Facebook, ERP, CRM, sensor data from production floor, historical data) are being stored into the servers. This data is being loaded onto a data warehouse and two data marts that are located on the right.
What is recommended for data governance?
For data governance, the use of a cross-functional work team is recommended.
What are four examples of the ten basic functions of data management? What is the main function of data management?
Four examples of the ten basic functions of data management are: Data development Data security management Data quality management Data architecture management The main function, according to the Data Management Association International (DAMA), associated with Data Management is: Data governance
How many primary components does Hadoop have?
Hadoop has two primary components.
Hadoop. What is it?
Hadoop is an open-source software framework that contains multiple software modules that allow for the storing and processing of extremely large data sets.
Hierarchy of data consists of...
Hierarchy of data consists of bits, characters, fields, records, files, and databases.
Archives
Historical records of communications and transactions.
In-memory database enables the...
IMDB enables the analysis of big data and complicated/challenging data-processing applications.
In-memory database provides much faster access to blank than...
IMDB provides a much faster access to data than what would be possible when trying to access data on some other form of secondary storage device.
What does IMDB stand for?
IMDB stands for In-memory database
Media data
Images Audio Video Podcasts
What happened in 1986?
In 1986, SQL was adopted by ANSI as the standard query language for relational databases.
In a relational model, each row represents what, and each column represents?
In a relational model: Each row in the table represents an entity. Each column represents an attribute of the entity.
In a tabular design, entities are blanks and attributes are blanks
In a tabular design, entities are rows, and attributes are columns.
In the database approach to data management, information systems share what of related data?
In the database approach to data management, information systems share a pool of related data.
What are two examples of IMDB providers (the manufacturer, product name, and major customers)?
In-memory database providers: Manufacturer, product name, major customers Altibase, HDB, E*Trade & China Telecom. Oracle, Times Ten, Lockheed Martin & Verizon Wireless.
What does LAP stand for?
LAP stands for Logical Access Path.
Public data
Local, state, and federal government websites.
Is a database administrator the same as a data administrator?
No, the database administrator and the data administrator are two separate individuals within the company/organization (medium to large-sized organizations/firms will also have a data administrator).
Must data be organized in a meaningful way? Why?
Of course, data should indeed be organized in a meaningful way in order to make sense of it and transform it into useful information.
Simplified entity-relationship diagram among the manager, department, and project tables (explain).
One manager governs/supervises many departments, but each department can be traced back to one manager. One department performs many projects, but each project can be traced back to one department.
What does PAP stand for?
PAP stands for Physical access path.
Sensor data
Process control devices Smart electric meters
What does Query-by-example mean?
QBE is a visual approach to developing database queries or requests.
What does QBE stand for?
QBE stands for Query-by-Example.
SQL databases conform to blank properties. What are they?
SQL databases conform to ACID properties. These properties are: Atomicity Consistency Isolation Durability
SQL has become an integral part of most blank databases. What's an example of an SQL?
SQL had become an integral part of most relational databases. An example of an SQL may be Microsoft Access (the figure shows the Access from 2013).
What is an SQL?
SQL is a special purpose programming language that is used for accessing and manipulating data that is stored within relational databases.
SQL stands for?
SQL stands for Structured Query Language.
What are the four different ways that you can manipulate data?
Selecting Projecting Joining Linking
What are some parts that could be seen in a data dictionary entry (example within figure 3.14)?
Some parts that could be seen in a data dictionary entry may be: Northwestern Manufacturing: Prepared by: Name Date: Approved by: Name Date: Data element name: Description: Version: Etc.
What is the description of figure 3.22? The Big Data Lifecycle is a policy...
The Big Data Lifecycle is a policy-based approach to managing the flow of an enterprise's data: this includes the time of when the data is gathered or created and stored, until the moment of when it should be either archived or deleted for good (because it becomes outdated).
What can the DBMS reference a schema for?
The DBMS can actually reference or use a schema to determine where to find the requested data in relation to another piece of data.
What does the DDL allow the database creator to do?
The DDL allows the database creator to describe the data and relationships that will be part of or contained in the schema.
What did DAMA do?
The Data Management Association International has identified ten basic functions associated with data management.
What is the claw-like structure in the customer order database's entity-relationship diagram? Give two or three examples.
The claw-like structure represents either one-to-many or many-to-one relationships (the claw side represents many, while the side with one line represents the one). Examples: One salesperson salesperson serves many customers, but may customers can only trace back to one salesperson (that served them individually). One customer places many orders, but each order can be traced back to only one person or account (two accounts cannot work together to place one order). One order generates one invoice (and vice-versa; example of a one-to-one relationship).
The cost of performing data cleansing to achieve 100% database accuracy, can be blank expensive.
The cost of performing data cleansing to reach 100% data accuracy, can be prohibitively expensive.
What type of position can the data administrator be?
The data administrator can be a high-level position reporting to top-level managers.
Explain Figure 3.4 - Database approach to data management. How does it look like?
The data base (a cylinder) is on the left - within it, there are multiple types of data (payroll data, inventory data, invoicing data, and other data). The DBMS (Database Management System) is in the middle, and is serving as an interface between the Database and the application programs. To the right of the DBMS, there are application programs (such as payroll program, inventory control program, invoicing program, and management inquiries), that are producing reports for the users.
Who handles the database administration?
The database administration is handled by the service provider.
By who and where is the database accessed?
The database is accessed by the Client over a network, typically the Internet.
What does the enterprise data model do? (Hint: provides what?)
The enterprise data model provides a roadmap for the building of information systems and databases.
What does the enterprise model look like? (What are the benefits listed in each part)?
The enterprise model starts at the bottom with the data model. The data model supports the systems and data which are above it. The systems and data then support the entire enterprise. The data model enables a simpler interface when accessing the systems and data, and reduces data redundancy (ensures compatible data). The systems and data (for the enterprise) reduce costs, increase the effectiveness of the business, and provides business opportunities.
The example given in the PowerPoint, portrays an entity-relationship diagram for what kind of database?
The example given in the PowerPoint, portrays an entity-relationship diagram for a customer order database.
What are the five challenges of big data?
The five challenges of big data are as follows: How to choose which subset of data to store Where and how to store the data How to find the nuggets of data that are relevant to the decision making at hand How to derive value from the relevant data How to identify which data needs to protected from unauthorized.
What are four database activities?
The four database activities are as follows: Provide a user view of the database Create and modifying the database Store and retrieve data Manipulate the data and generate reports
What are the four main categories of NoSQL database?
The four main categories of NoSQL database Key-Value Document Graph Column
What is the hierarchy of data (different levels)?
The hierarchy of data, from smallest to largest, is as follows: Bits Bytes/Characters Fields Records Files Databases
When does the IMDB work at its best?
The in-memory database performs at its best with/on multiple multicore CPUs.
What process is associated with data warehouse? What does it stand for?
The process that is associated with data warehouse is known as ETL. ETL stands for Extract, Transform, and Load.
What are the tables in a relational database model known as?
The simple two-dimensional tables in a relational database model are known as relations.
What are the six consideration when building a database? - the questions as well.
The six considerations when building a database are: 1. Content: what data should be collected? How much is it doing to cost to collect the data? 2. Access: what data should be provided to which users and when? 3. Logical Structure: how should the data be arranged so that it makes sense? 4. Physical Organization: where should the data be physically located? 5. Archiving: how long to store? 6. Security: how can the data be protected?
Data Item
The specific value of an attribute.
What are the three characteristics of big data?
The three characteristics of big data are: Volume Velocity Variety
What are the three factors that drive data management?
The three factors that drive data management are: 1. The need to meet e termal regulations designed to manage risk associated with financial misstatement. 2. The need to avoid the release of sensitive data. 3. The need to ensure that high quality of data is available when making key decisions.
What are the three groups of DBMS? What are examples (three for each)?
The three groups of DBMSs are: Open-Source Relational Database Management Systems MySQL SQL Lite MariaDB CouchDB Relational database management systems for individuals and workgroups Microsoft Access Google Base Open Office Base Relational database management systems for workgroups and enterprise Oracle Teradata Microsoft SQL Server
What are the two primary components of Hadoop?
The two primary components of Hadoop are: A data processing component: MapReduce A distributed file system: Hadoop Distributed File System (HDFS).
GRANT INSERT ON Client to Guthrie - what does it mean?
This SQL command is allowing Guthrie to be able to add rows or columns to the Client relation (table). An example of a security command.
SELECT ClientName, Debt FROM Client WHERE Debt > 1000. What does this query demonstrate?
This SQL command/query demonstrates the name of clients whose debt to the company (the money they owe), is greater than 1,000$. The debt data will be found within the relational table known as the Client Table.
What should this cross functional work team be composed of?
This cross-functional work team should be composed of: Executives Project managers Line-of-business managers Data stewards
What are the eight sources of data in figure 3.20? What does this figure look like?
This figure has an octagon in the middle that mentions "an organization's collection of useful data." An arrow for each source of data that can be collected by the organization, is pointing at one of the sides of the octagon. The eight sources of data are: Document data Media data Social media data Public data Business applications data Archive data Sensor data Machine log data
This group of programs do what things?
This group of programs actually do two things: 1. The programs manipulate the database. 2. The programs serve as an interface between the database and its users, as well as other application programs.
SELECT ClientName, ClientNum, OrderNum FROM Client, Order WHERE Client.ClientNum=Order.ClientNum. What does this query or SQL command mean?
This is a prime example of joining data from two separate relational tables. The attributes ClientName, ClientNum, and OrderNum will be included in the new table, and these two tables will share a common attribute (ClientNum from both tables will equal one another).
What are three examples of big data use?
Three examples of big data use are as follows: Retail organizations monitor social networks to see brands that support their own, or that are adversaries. Advertising and marketing agencies track comments on social media. Hospitals analyze medical data and patient records.
Together, bits, characters, fields, records, files, and databases, formulate the...
Together, bits, characters, fields, records, files, and databases, formulate the hierarchy of data.
There are times when data in data marts are more detailed than in a data warehouse. True or False?
True, there are indeed times when data in data marts are more detailed than in data warehouses.
The development of ER (entity-relationship) diagrams helps to ensure that the logical structure of the application programs is consistent with the data relationships in the database. True or False?
True. The development of the entity-relationship diagram does indeed help to ensure that the logical structure of application programs is consistent with the data relationships in the database.
Social Media
Twitter Snapchat Facebook Instagram LinkedIn Pinterest
What are two advantages of NoSQL databases?
Two advantages of NoSQL database may be: The ability to spread data over multiple servers so that each server contains a subset of the total data You do not need a predefined schema when making a NoSQL database.
Using multiple tables that have at least one common data attribute to find a specific answer, requires you to blank the data. Figure 3.9 demonstrates this. What type of manipulation of data is this?
Using multiple tables that have at least one common data attribute to find a specific answer, requires you to manipulate the data. Figure 3.9 demonstrates this. This manipulation of data is known as linking.
Describe the physical and logical access paths - figure 3.15.
When an application program requests data through a database management system, it is doing so with the use of a logical access path. The DBMS obtains the data from the storage device with the use of a physical access path, and then provides it to the application program (again with the LAP).
When an application program needs data, where does it request it through?
When and Application Program needs data, it requests this data through the database management system.
With DaaS, where is the database stored?
With DaaS, the database itself is stored in the service provider's servers.
Is it true that Hadoop can be used as a staging area for data to be loaded onto data warehouses and data marts?
Yes, it is indeed true that Hadoop can be used as a staging area for Data to be loaded onto data warehouses or/and data marts.
Is it true that most, if not all organizations have a collection of useful big data (a collection of data)?
Yes, it is indeed true that most, if not all organizations do have a collection of useful data at their disposal.
Is it true that the database approach to data management offers the ability to share data and information resources?
Yes, it is indeed true that the database approach to data management offers the ability to share data and information resources.
Do the capabilities and types of database systems vary considerably?
Yes, the capabilities and types of database systems vary considerably.
Can the cost of performing data cleansing be high?
Yes, the cost of performing data cleansing dan be quite high.
Document data: 2 examples
eMails, Microsoft Excel Also PowerPoint.
Selecting
eliminating rows according to certain criteria