MIS ch 6
Why is an effective ETL process essential to data warehousing?
"Dirty data" can result in incorrect or misleading statistics used for decision making.
Donna is a member of a team trying to select the best type of database for a business problem. If the database must handle a variety of data, she would like to store the data on a group of servers, and the data structures must be very flexible, what would you suggest?
A NoSQL database would likely be a better fit than a relational database.
Because they must deal with large quantities of data from so many different sources, IS employees at financial institutions may be at increased risk of failing to comply with government regulations designed to prevent money laundering, such as the _____.
Bank Secrecy Act
_____ is the term used to describe enormous and complex data collections that traditional data management software, hardware, and analysis processes are incapable of handling.
Big data
Which of the following is IBM's BI product?
Cognos Business Intelligence
_____ is used to explore large amounts of data for hidden patterns to predict future trends.
Data mining
____ is the last phase of the six-phase CRISP-DM method.
Deployment
Suppose your employer has four manufacturing units, each with its own data operations. They are considering using BI and analytics tools. Which of the following is a valid recommendation regarding the BI and analytics initiative?
Ensure that an effective company-wide data management program, including data governance, is in place.
Which of the following is required to create a traditional data warehouse but NOT a data lake?
Extract Transform Load process
What open-source software framework includes several software modules that provide a means for storing and processing extremely large data sets, organized into two primary components?
Hadoop
Which statement about Hadoop is correct?
Hadoop's HDFS divides data into subsets and distributes them onto different servers.
Which of the following is a potential disadvantage of self-service analytics?
It can lead to over-spending on unapproved data sources and business analytics tools.
You are to advise XYZ Corporation so that their BI and analytics efforts are fruitful. Which among the following is the most crucial advice of all?
Management must have a strong commitment to data-driven decision making.
Hadoop processes data using a Java-based system called _____.
MapReduce
Which statement regarding processing task completion using Hadoop's MapReduce program is correct?
MapReduce employs a JobTracker residing on the master server and TaskTrackers residing on other servers.
A newly discovered entity or attribute can be added to a NoSQL database dynamically because _____.
NoSQL databases do not require a predefined schema
One difference between NoSQL and relational databases is that _____.
NoSQL databases have a greater horizontal scaling capability
Sometimes Theodore's queries for data from his employer's NoSQL database do not return the most current data. Why is this?
NoSQL databases provide for "eventual consistency" when processing transactions.
Once customer orders are organized into queues based on product ID, the component of the Hadoop software that performs a summary operation, such as determining how frequently each product was ordered, is the _____.
Reduce method
From which vendor is the BI product Business Objects available?
SAP
_________ encourages nontechnical end users to make decisions based on facts and analyses rather than intuition.
Self-service analytics
The graphical representation that summarizes the steps a consumer takes in making the decision to buy your product and become a customer is called _____.
a conversion funnel
Hadoop's two major components are _____.
a data processing component and a distributed file system
Which of the following events is an example of a transformation that might occur during the second stage of the ETL process?
a sales district is substituted for a customer's street address
Marina is a data scientist at a large financial corporation, and therefore she _____.
aims to uncover insights that will influence organizational decisions
During the modeling phase of the CRISP-DM method, the team conducting the data mining project ______.
applies selected modeling techniques
Graph NoSQL databases _____.
are well-suited for analyzing interconnections
Which of the following is the LEAST essential characteristic for success as a data scientist?
business leadership skills
One challenge presented by the volume of big data is that _____.
business users can have a hard time finding the information they need to make decisions
An important role of the data scientist is to _____.
communicate his or her findings to organizational leaders
Melanie's company takes a "store everything" approach to big data, saving all of it in a raw, unaltered form. Only when she needs to analyze some of the data is it extracted from this _____.
data lake
A _____ is a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making.
data mart
Which of the following is NOT considered a component of business intelligence?
data transaction processing
Suppose you have access to quantitative data on populations by ZIP code and crime rates. You wish to determine if there is a relationship between the two variables and display the results in a graph. Which BI tool will be most useful?
data visualization tool
Barry's job responsibilities include helping maintain a large database that holds business information from over a dozen source systems, covering all aspects of his company's processes, products, and customers. This database contains not only enterprise data but also data from other organizations. Barry works with a(n) _____.
data warehouse
The Amazon DynamoDB and Oracle NoSQL Database products both support which data storage and retrieval models?
document and key-value
Jan is using her firm's data warehouse. If she is starting from monthly sales data, and wishes to get weekly sales data, she should use the ___ feature.
drill down
If not well managed, self-service BI and analytics may lead to poor decisions based on _____.
erroneous analysis and reporting
During which step of the ETL process can data that fails to meet expected patterns or values be rejected to help clean up "dirty data"?
extract
Which step of the ETL process has the goal of collecting source data from all the desired sources and converting it into a single format suitable for processing?
extract
A conversion funnel is a visual depiction of a set of words that have been grouped together because of the frequency of their occurrence.
false
A marketing manager who does not have deep knowledge of information systems or data science will NOT be able to use BI and analytics tools.
false
Business analytics can be used only for forecasting future business results.
false
Creative data scientists--a key component of BI efforts in an organization--are people who are primarily focused on coming up with novel ways of analyzing data.
false
Data scientists do not need much business domain knowledge.
false
In the regression equation y = ax1 + bx2, the coefficients a and b represent the dependent variables.
false
Organizations can collect many types of data from a wide variety of sources, but typically they only collect structured data that fits neatly into traditional relational database management systems.
false
Regression analysis focuses on plotting a sequence of well-defined data points measured at uniform time intervals.
false
Scenario analysis tools are the best choice for solving optimization-type problems.
false
Suppose a manager wishes to analyze historical trends in sales. He would use the online transaction processing (OLTP) system.
false
Users still need help from the IT function of the organization to create customer reports using modern reporting tools.
false
Jerome recommends that his company's IS team consider moving from a database on secondary storage to an in-memory database because the in-memory database would provide _____.
faster access to data
What does Hadoop's Map procedure from its MapReduce program do?
filters and sorts
What determines the size of words in a word cloud?
frequency of occurrence of the word in source documents
Mollie has observed that her company's leadership lack a strong commitment to data-driven decision making. As a result, when she learns that her company will be initiating a business intelligence and analytics program, she anticipates that _____.
her company will miss out on the real value of their BI and analytics
Which of these analysis methods describes neural computing?
historical data is examined for patterns that are then used to make predictions
A hospital system that wants to utilize big data can use HIPAA regulations to help them _____.
identify which data needs to be protected from unauthorized access
A database system that stores the entire database in random access memory is known as a(n) _____.
in-memory database
KDDI Corporation chose to consolidate their servers into a single Oracle SuperCluster running the Oracle Times Ten in-memory database in order to _____.
increase data access rates and efficiency
Which of the following is NOT a recognized BI and analytics technique?
online transaction processing
One of the goals of business intelligence is to _________.
present the results of analysis in an easy-to-understand manner
Some people are alarmed that big data applications allow organizations to develop extensive profiles of individuals without their knowledge or consent. This represents which type of concern related to big data?
privacy
Data scientists are a necessary component to ensure an organization's business intelligence and analytics efforts are effective because they _____.
pull together knowledge of the business and data analytics tools and techniques
The key challenges associated with big data include the difficulty of locating and deriving value from _____.
relevant data to make decisions
Self-service BI and analytics can exacerbate problems by _____.
removing checks and balances on data preparation and use
Jamie's corporation comes under scrutiny by the media when former employees allege that the IS department failed to correctly identify which data needed protection from unauthorized access. These accusers say that this organization is not ensuring its big data is _____.
secure
To identify and make predictions about various alternative scenarios, a manager would use _______.
simulation techniques
The purpose of business intelligence is to _____.
support improved decision making
Big data veracity is a measure of _____.
the accuracy, completeness, and currency of the data
During the load phase of the ETL process, _____.
the data is checked against the constraints defined in the database schema
According to the McKinsey Global Institute, _____.
the demand for data scientists could outpace supply by up to 250,000 jobs in 2024
The use of in-memory databases for processing big data has become feasible in recent years, thanks to _____.
the increase in RAM capacities
Between 2017 and 2025, _____.
the volume of data in the digital universe is expected to grow tenfold
One key difference between a relational database and a NoSQL database is _____.
the way data storage and retrieval are modeled
Roberta's IS team processes data using an in-memory database with a multiple-core CPU. This means that _____.
they can process large amounts of data rapidly
Why are data managers recommended to determine key metrics, an agreed-upon vocabulary, and how to define and implement security and privacy policies when setting up a self-service analytics program?
to mitigate the associated risks
A well-designed series of rules or algorithms is a key component of which stage of the ETL process?
transform
Data governance involves identifying people who are responsible for fixing and preventing issues with data.
true
Data mining is used to explore large amounts of data, looking for hidden patterns that can be used to predict future trends and behaviors.
true
During drill-down, you go from high-level summary data to detailed levels of data.
true
For an organization to get real value from its BI and analytics efforts, it must have a solid data management program.
true
If you wish to study a visual depiction of the relative frequencies of words in a document, a word cloud would be an appropriate option.
true
Regression analysis is useful when you wish to predict the value of a quantitative variable based on a another quantitative variable.
true
Some insurance companies can detect fraudulent claims using BI and analytics software.
true
Suppose you are good in math and statistics. Adding programming to your skill-set will likely be necessary if you want to be a data scientist.
true
The data for BI (business intelligence) comes from many sources.
true
To solve a linear programming problem in order to maximize profits for certain product, you can use Excel's Solver add-in.
true
Unstructured data comes from sources such as word-processing documents and surveillance video.
true
Haley's employer has asked her to review a database containing thousands of social media posts about their company's products and extract the data the executive team needs to make decisions about these products and their marketing. In terms of the characteristics of big data, Haley is focusing on ________.
value
Guillarme, a data scientist, utilizes data from company documents, machine logs, Data.gov, and Facebook Graph in his work. What characteristic of big data does this best demonstrate?
variety
One key characteristic of big data is that it is being generated at a rate of 2.5 quintillion bytes per day. This is known as big data's _____.
velocity
Which of the following is NOT a component required for effective BI and analytics?
well-maintained NoSQL databases
Marshall's company currently maintains their data on in-house servers, but his supervisor has asked him to research their options for having some or all of it hosted by a cloud service provider. Which challenge of big data is Marshall helping to address?
where and how to store the data