MIS Exam Chapter 9 Short Answers
Describe the features of a data mart.
A data mart is a data collection, smaller than the data warehouse that addresses the needs of a particular department or functional area of the business. If the data warehouse is the distributor in a supply chain, then a data mart is like a retail store in a supply chain. Users in the data mart obtain data that pertain to a particular business function from the data warehouse. Such users do not have the data management expertise that data warehouse employees have, but they are knowledgeable analysts for a given business function.
What is the objective of performing a market-basket analysis?
A market-basket analysis is an unsupervised data mining technique for determining sales patterns. Such an analysis shows the products that customers tend to buy together. In marketing transactions, the fact that customers who buy product X also buy product Y creates a cross-selling opportunity; that is, "If they're buying X, sell them Y" or "If they're buying Y, sell them X."
What is a reporting application? Name five basic reporting operations.
A reporting application is a BI application that inputs data from one or more sources and applies reporting operations to that data to produce business intelligence. Reporting applications produce business intelligence using five basic operations: • Sorting • Filtering • Grouping • Calculating • Formatting
1) Define business intelligence (BI) and BI systems.
Business intelligence (BI) systems are information systems that process operational, social, and other data to identify patterns, relationships, and trends for use by business professionals and other knowledge workers. These patterns, relationships, trends, and predictions are referred to as business intelligence. As information systems, BI systems have the five standard components: hardware, software, data, procedures, and people. The software component of a BI system is called a BI application.
Describe the management functions of business intelligence (BI) servers.
Business intelligence servers provide two major functions: management and delivery. The management function maintains metadata about the authorized allocation of BI results to users. The BI server tracks what results are available, what users are authorized to view those results, and the schedule upon which the results are provided to the authorized users. It adjusts allocations as available results change and users come and go. All management data needed by any of the BI servers is stored in metadata. The amount and complexity of such data depends, of course, on the functionality of the BI server.
Differentiate between unsupervised and supervised data mining.
Data mining techniques fall into two broad categories: unsupervised and supervised. With unsupervised data mining, analysts do not create a model or hypothesis before running the analysis. Instead, they apply a data mining application to the data and observe the results. With this method, analysts create hypotheses after the analysis, in order to explain the patterns found. With supervised data mining, data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model.
Describe expert systems and their primary disadvantages.
Expert systems are rule-based systems that encode human knowledge in the form of "If/Then" rules. To create the system of rules, the expert system development team interviews human experts in the domain of interest. They suffer from three major disadvantages. First, they are difficult and expensive to develop. Second, expert systems are difficult to maintain. Finally, expert systems were unable to live up to the high expectations set by their name.
Explain knowledge management and its primary benefits.
Knowledge management (KM) is the process of creating value from intellectual capital and sharing that knowledge with employees, managers, suppliers, customers, and others who need that capital. KM benefits organizations in two fundamental ways: • Improve process quality • Increase team strength
What are MapReduce and Hadoop?
MapReduce is a technique for harnessing the power of thousands of computers working in parallel. The basic idea is that the BigData collection is broken into pieces, and hundreds or thousands of independent processors search these pieces for something of interest. Hadoop is an open-source program supported by the Apache Foundation that implements MapReduce on potentially thousands of computers. Hadoop could drive the process of finding and counting the Google search terms, but Google uses its own proprietary version of MapReduce to do so, instead. Hadoop includes a query language titled Pig.
What is OLAP? Explain its features.
Online analytical processing (OLAP) is an important reporting application. It is more generic than RFM. OLAP provides the ability to sum, count, average, and perform other simple arithmetic operations on groups of data. The remarkable characteristic of OLAP reports is that they are dynamic. The viewer of the report can change the report's format, hence the term online. An OLAP report has measures and dimensions. A measure is the data item of interest. It is the item that is to be summed or averaged or otherwise processed in the OLAP report. Total sales, average sales, and average cost are examples of measures. A dimension is a characteristic of a measure. Purchase date, customer type, customer location, and sales region are all examples of dimensions. With an OLAP report, it is possible to drill down into the data. This term means to further divide the data into more detail.
Describe the functions of data warehouses and the need for them.
Operational data is structured for fast and reliable transaction processing. It is seldom structured in a way that readily supports BI analysis. Finally, BI analyses can require considerable processing; placing BI applications on operational servers can dramatically reduce system performance. Hence, most organizations extract operational data for BI processing. For small organizations, the extraction may be as simple as an Access database. The larger organizations typically create and staff a group of people who manage and run a data warehouse, which is a facility for managing an organization's BI data. The functions of a data warehouse are to: • Obtain data • Cleanse data • Organize and relate data • Catalog data
Explain the concept of RFM analysis.
RFM analysis, a technique readily implemented with basic reporting operations, is used to analyze and rank customers according to their purchasing patterns. RFM considers how recently (R) a customer has ordered, how frequently (F) a customer ordered, and how much money (M) the customer has spent. To produce an RFM score, the RFM reporting tool first sorts customer purchase records by the date of their most recent (R) purchase. In a common form of this analysis, the tool then divides the customers into five groups and gives customers in each group a score of 5 to 1. The 20 percent of the customers having the most recent orders are given an R score of 5, the 20 percent of the customers having the next most recent orders are given an R score of 4, and so forth, down to the last 20 percent, who are given an R score of 1. The tool then re-sorts the customers on the basis of how frequently they order. The 20 percent of the customers who order most frequently are given an F score of 5, the next 20 percent of most frequently ordering customers are given a score of 4, and so forth, down to the least frequently ordering customers, who are given an F score of 1. Finally, the tool sorts the customers again according to the amount spent on their orders. The 20 percent who have ordered the most expensive items are given an M score of 5, the next 20 percent are given an M score of 4, and so forth, down to the 20 percent who spend the least, who are given an M score of 1.
Explain the difference between static and dynamic reports.
Static reports are business intelligence (BI) documents that are fixed at the time of creation and do not change. A printed sales analysis is an example of a static report. In the BI context, most static reports are published as PDF documents. Dynamic reports are BI documents that are updated at the time they are requested. A sales report that is current at the time the user accessed it on a Web server is a dynamic report. In almost all cases, publishing a dynamic report requires the BI application to access a database or other data source at the time the report is delivered to the user.
What are the common problems with using operational data?
The problems associated with operational data are: • Problematic (dirty) data • Missing values • Inconsistent data • Nonintegrated data • Wrong granularity (too fine; not fine enough) • Too much data (too many attributes; too many data points)
Name and describe the three primary activities in the business intelligence (BI) process.
The three primary activities in the BI process are: acquire data, perform analysis, and publish results. Data acquisition is the process of obtaining, cleaning, organizing, relating, and cataloging source data. BI analysis is the process of creating business intelligence. The four fundamental categories of BI analysis are reporting, data mining, BigData, and knowledge management. Publish results is the process of delivering business intelligence to the knowledge workers who need it. Push publishing delivers business intelligence to users without any request from the users; the BI results are delivered according to a schedule or as a result of an event or particular data condition. Pull publishing requires the user to request BI results. Publishing media include print as well as online content delivered via Web servers, specialized Web servers known as report servers, and BI results that are sent via automation to other programs.