CH.6 Analytics
A _____ is a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making. a.data dictionary b.data model c.data mart d.data mine
c.data mart
A database system that stores the entire database in random access memory is known as a _____. a.relational database b.HDFS database c.in-memory database d.NoSQL database
c.in-memory database
In what ways can a BI and analytics system be useful to a grocery store or pharmacy?
Business intelligence (BI) and analytics system can be useful to a pharmacy because it can help pharmacists address medication adherence, ratings, med sync enrollment, see theirs and the patients data, and possible outcomes for the patient. Overall, proficiency in time, providing the right medicine, and the patient's health can all be improved with BI and analytics system.
Outline the steps in the Extract Transform Load (ETL) process and explain the purpose of each step.
The steps of the ETL process include: 1. Extract: Before any data can be moved to a different location, it has to be extracted from its source, which can either be a data warehouse or data lake. This data can be extracted from a wide range of data sources too. 2. Transformation: During this step, the selected data is being moved to its other destination; therefore, rules and regulations can be applied to ensure data quality, accessibility, and get rid of any of duplicates. That way when it arrives it is fully compatible and ready to use. 3. Loading: In this final step, the data is being loaded into the new destination either all at once or in scheduled intervals.
Why is an effective ETL process essential to data warehousing? a."Dirty data" can result in incorrect or misleading statistics used for decision making. b.Small- to medium-sized businesses need a suitable data warehousing option. c.The ETL process removes the necessity for a predefined database schema. d.Horizontal scaling of a relational database enables multiple servers to operate on the data.
a."Dirty data" can result in incorrect or misleading statistics used for decision making.
Which of the following is required to create a traditional data warehouse but NOT a data lake? a.Extract Transform Load process b.raw data c.data storage d.OLTP system and/or other data source(s)
a.Extract Transform Load process
Hadoop's two major components are _____. a.a data processing component and a distributed file system b.a cluster and a group of servers c.a JobTracker and a group of TaskTrackers d.a real-time data processor and a framework for data analytics
a.a data processing component and a distributed file system
Graph NoSQL databases _____. a.are well-suited for analyzing interconnections b.have only two columns c.store, retrieve, and manage document-related information d.focus on only keys and values
a.are well-suited for analyzing interconnections
To identify and make predictions about various alternative scenarios, a manager would use _______. a.simulation techniques b.descriptive analytics c.optimization techniques d.visual analytics
a.simulation techniques
_______ is used to explore large amounts of data for hidden patterns to predict future trends. a.Data governance b.Data mining c.Regression analysis d.A genetic algorithm
b.Data mining
The graphical representation that summarizes the steps a consumer takes in making the decision to buy your product and become a customer is called _____. a.a word cloud b.a conversion funnel c.a scatter diagram d.a pivot chart
b.a conversion funnel
Melanie's company takes a "store everything" approach to big data, saving all of it in a raw, unaltered form. Only when she needs to analyze some of the data is it extracted from this _____. a.data mart b.data lake c.data warehouse d.in-memory database
b.data lake
KDDI Corporation chose to consolidate their servers into a single Oracle SuperCluster running the Oracle Times Ten in-memory database in order to _____. a.change their model for data storage and retrieval b.increase data access rates and efficiency c.implement the Extract Transform Load process d.provide employees with more targeted data marts
b.increase data access rates and efficiency
Some people are alarmed that big data applications allow organizations to develop extensive profiles of individuals without their knowledge or consent. This represents which type of concern related to big data? a.security b.privacy c.validity d.relevance
b.privacy
Which statement about Hadoop is correct? a.Hadoop's major limitation is that is cannot perform batch processing. b.Each server in a Hadoop cluster houses the entire data set plus a processing system. c.Hadoop's HDFS divides data into subsets and distributes them onto different servers. d.Hadoop runs on top of an existing Apache Storm cluster and accesses its data store.
c.Hadoop's HDFS divides data into subsets and distributes them onto different servers.
One difference between NoSQL and relational databases is that _____. a.NoSQL databases require large, powerful, and expensive proprietary servers b.relational databases consistently provide faster response times for queries c.NoSQL databases have a greater horizontal scaling capability d.relational databases can easily spread data over multiple servers
c.NoSQL databases have a greater horizontal scaling capability
Which of the following is NOT a recognized BI and analytics technique? a.genetic algorithm b.Monte Carlo simulation c.online transaction processing d.linear programming
c.online transaction processing
The key challenges associated with big data include the difficulty of locating and deriving value from _____. a.regulations designed to prevent fraud b.security service providers c.relevant data to make decisions d.structured data to protect privacy
c.relevant data to make decisions
Big data veracity is a measure of _____. a.the degree of organization or structure of the data b.the data's worth for decision making in a given scenario c.the accuracy, completeness, and currency of the data d.the rate at which data in an area is becoming available
c.the accuracy, completeness, and currency of the data
During the load phase of the ETL process, _____. a.quality control measures are not necessary because of the previous two phases b.the data is often aggregated to reduce anticipated report processing time c.the data is checked against the constraints defined in the database schema d.progress in processing the data is more rapid than in the previous two phases
c.the data is checked against the constraints defined in the database schema
A newly discovered entity or attribute can be added to a NoSQL database dynamically because _____. a.the database provides horizontal scaling capability b.data storage is modeled using simple two-dimensional relations c.NoSQL databases do not conform to ACID properties d.NoSQL databases do not require a predefined schema
d.NoSQL databases do not require a predefined schema
During the modeling phase of the CRISP-DM method, the team conducting the data mining project ______. a.clarifies business goals for the data mining project b.selects a subset of data to be used and prepares it c.assesses whether the selected model achieves business goals d.applies selected modeling techniques
d.applies selected modeling techniques
During which step of the ETL process can data that fails to meet expected patterns or values be rejected to help clean up "dirty data"? a.edit b.load c.transform d.extract
d.extract
One key characteristic of big data is that it is being generated at a rate of 2.5 quintillion bytes per day. This is known as big data's _____. a.variety b.volume c.veracity d.velocity
d.velocity