chapter 9 SA

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

How is a data warehouse different from a data mart?

A data warehouse can be compared to a distributor in a supply chain. The data warehouse takes data from the data manufacturers (operational systems and other sources), cleans and processes the data, and locates the data on the shelves of the data warehouse. The data analysts who work with a data warehouse are experts at data management, data cleaning, data transformation, data relationships, and the like. The data warehouse then distributes the data to data marts. A data mart is a data collection, smaller than the data warehouse, that addresses the needs of a particular department or functional area of the business. If a data warehouse is the distributor in a supply chain, then a data mart is like a retail store in a supply chain. Users in a data mart obtain data that pertain to a particular business function from a data warehouse. Such users do not have the data management expertise that data warehouse employees have, but they are knowledgeable analysts for a given business function.

What are the functions of a data warehouse?

A data warehouse takes data from data manufacturers (operational systems and other sources), cleans and processes the data, and locates the data on the shelves of the data warehouse. Data analysts who work with a data warehouse are experts at data management, data cleaning, data transformation, data relationships, and the like. However, they are not usually experts in a given business function. The data warehouse then distributes the data to data marts.

What is BigData?

BigData is a term used to describe data collections that are characterized by huge volume, rapid velocity, and great variety. Considering volume, BigData refers to data sets that are at least a petabyte in size, and usually larger. Additionally, BigData has high velocity, meaning that it is generated rapidly. BigData is varied. BigData may have structured data, but it also may have free-form text, dozens of different formats of Web server and database log files, streams of data about user responses to page content, and possibly graphics, audio, and video files. BigData analysis can involve both reporting and data mining techniques. The chief difference is, however, that BigData has volume, velocity, and variation characteristics that far exceed those of traditional reporting and data mining.

What are business intelligence systems?

Business intelligence (BI) systems are information systems that process operational and other data to analyze past performance and to make predictions. The patterns, relationships, and trends identified by BI systems are called business intelligence. As information systems, BI systems have five standard components: hardware, software, data, procedures, and people. The software component of a BI system is called a BI application.

What is clickstream data?

Clickstream data is the data that is captured from customers' clicking behavior. Such data is very fine and includes everything a customer does at a Web site. Because the data is too fine, data analysts must throw away millions and millions of clicks if a study requires data that is coarser.

What is data granularity?

Data granularity refers to the level of detail represented by data. Granularity can be too fine or too coarse. In general, it is better to have too fine a granularity than too coarse. If the granularity is too fine, the data can be made coarser by summing and combining. If the granularity is too coarse, however, there is no way to separate the data into constituent parts.

What is data mining?

Data mining is the application of statistical techniques to find patterns and relationships among data for classification and prediction. Data mining techniques emerged from the combined discipline of statistics, mathematics, artificial intelligence, and machine-learning. Most data mining techniques are sophisticated, and many are difficult to use well. Such techniques are valuable to organizations, and some business professionals, especially those in finance and marketing, have become expert in their use. Data mining techniques fall into two broad categories: unsupervised and supervised.

What is predictive policing?

Many police departments are facing severe budget constraints that force them to reduce on-duty police personnel and services. Given these budget cuts, police departments need to find better ways of utilizing their personnel. Predictive policing uses business intelligence to analyze data on past crimes, including location, date, time, day of week, type of crime, and related data, to predict where crimes are likely to occur. Police personnel are then stationed in the best locations for preventing those crimes.

Differentiate between push publishing and pull publishing.

Push publishing delivers business intelligence to users without any request from the users; the BI results are delivered according to a schedule or as a result of an event or particular data condition. Pull publishing requires the user to request BI results.

How does business intelligence help marketers identify changes in the purchasing patterns of customers?

Retailers know that important life events cause customers to change what they buy and, for a short interval, to form new loyalties to new store brands. Before the advent of BI, stores would watch the local newspapers for graduation, marriage, and baby announcements and send ads in response, which is a slow, labor-intensive, and expensive process. However, by applying business intelligence techniques to their sales data, companies can identify the purchasing pattern for different products, and by observing this purchasing pattern, companies can send ads for related products to those customers.

Differentiate between static reports and dynamic reports.

Static reports are business intelligence documents that are fixed at the time of creation and do not change. A printed sales analysis is an example of a static report. In the business intelligence context, most static reports are published as PDF documents. Dynamic reports are business intelligence documents that are updated at the time they are requested. A sales report that is current as of the time a user accessed it on a Web server is a dynamic report. In almost all cases, publishing a dynamic report requires the business intelligence application to access a database or other data source at the time the report is delivered to the user.

Explain the curse of dimensionality.

The curse of dimensionality is associated with the problem of data having too many attributes. For example, if internal customer data is combined with customer data that has been purchased, there will be more than hundred different attributes to consider. It is hard to select only a few attributes from those available. The curse of dimensionality states that the more attributes there are, the easier it is to build a model that fits the sample data, but that is worthless as a predictor.

Explain the functions of a data warehouse.

The functions of a data warehouse are to: 1. Obtain data 2. Cleanse data 3. Organize and relate data 4. Catalog data Programs read operational and other data and extract, clean, and prepare that data for business intelligence processing. The prepared data are stored in a data warehouse database using a data warehouse DBMS, which can be different from an organization's operational DBMS. Data warehouses include data that are purchased from outside sources. Metadata concerning the data-its source, its format, its assumptions and constraints, and other facts about the data-are kept in a data warehouse metadata database. The data warehouse DBMS extracts and provides data to BI applications.

What are the three primary activities in the business intelligence process?

The three primary activities in the business intelligence process include: acquire data, perform analysis, and publish results. Data acquisition is the process of obtaining, cleaning, organizing, relating, and cataloging source data. Business intelligence analysis is the process of creating business intelligence and includes three fundamental categories: reporting, data mining, and BigData. Publish results is the process of delivering business intelligence to the knowledge workers who need it.

Explain supervised data mining.

With supervised data mining, data miners develop a model prior to the analysis and apply statistical techniques to data to estimate parameters of the model. For example, suppose marketing experts in a communications company believe that cell phone usage on weekends is determined by the age of the customer and the number of months the customer has had the cell phone account. A data mining analyst would then run an analysis that estimates the impact of customer and account age. One such analysis, which measures the impact of a set of variables on another variable, is called a regression analysis.

What is unsupervised data mining?

With unsupervised data mining, analysts do not create a model or hypothesis before running the analysis. Instead, they apply a data mining technique to the data and observe the results. With this method, analysts create hypotheses after the analysis to explain the patterns found. These findings are obtained solely by data analysis. One common unsupervised technique is cluster analysis. With it, statistical techniques identify groups of entities that have similar characteristics. A common use for cluster analysis is to find groups of similar customers from customer order and demographic data.


Set pelajaran terkait

Describe cloud concepts (25–30%)

View Set

Multinational Finance Ch. 11; Translation Exposure

View Set

Anatomy of the Female Pelvis: Review Questions

View Set

Chapter 19: Safe For Democracy: The United States And World War I, 1916-1920

View Set

Directional Terms for the Human Body

View Set

Chapter 28: Assessment of Cardiovascular Function

View Set

CCNA 1 Chapter 11 v5.0 Exam Answers 2015

View Set

Principles of Economics Review for Exam

View Set