EXAM 1 - ISBC 510

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

63) According to Eckerson (2006), a well-known expert on BI dashboards, what are the three layers of information of a dashboard?

1. Monitoring. Graphical, abstracted data to monitor key performance metrics. 2. Analysis. Summarized dimensional data to analyze the root cause of problems. 3. Management. Detailed operational data that identify what actions to take to resolve a problem.

34) When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema

A - Section 3.6

63) What is the definition of a data mart?

A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g., marketing, operations). Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department.

59) The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.

Apriori algorithm

22) In the Opening Vignette on Sports Analytics, what type of modeling was used to predict offensive tactics? A) heuristics B) heat maps C) cascaded decision trees D) sentiment analysis

B

23) Kaplan and Norton developed a report that presents an integrated view of success in the organization called A) metric management reports. B) balanced scorecard-type reports. C) dashboard-type reports. D) visual reports.

B

40) Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

B

25) In what decade did disjointed information systems begin to be integrated? A) 1970s B) 1980s C) 1990s D) 2000s

B - Section 1.3

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

C

32) Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT A) mobile platforms such as the iPhone are supported by these products. B) it is easier to spot useful patterns and trends in the data. C) they explore massive amounts of data in hours, not days. D) there is less demand on IT departments for reports.

C

26) Relational databases began to be used in the A) 1960s. B) 1970s. C) 1980s. D) 1990s.

C - Section 1.3

39) This plot is a graphical illustration of several descriptive statistics about a given data set. A) pie chart B) bar graph C) box-and-whiskers plot D) kurtosis

C - Section 2.5

Key factors that delineate the things that an organization must excel at to be successful in its market space (may also be called key performance indicators, or KPIs in short).

Critical Success Factors

21) In the Influence Health case study, what was the goal of the system? A) locating clinic patients B) understanding follow-up care C) decreasing operational costs D) increasing service use

D

21) In the Opening Vignette on Sports Analytics, what was adjusted to drive one-time ticket sales? A) player selections B) stadium location C) fan tweets D) ticket prices

D

21) Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action

D

22) Key performance indicators (KPIs) are metrics typically used to measure A) database responsiveness. B) qualitative feedback. C) external results. D) internal results.

D

24) Which characteristic of data requires that the variables and data values be defined at the lowest (or as low as required) level of detail for the intended use of the data? A) data source reliability B) data accessibility C) data richness D) data granularity

D

24) Which of the following developments is NOT contributing to facilitating growth of decision support and analytics? A) collaboration technologies B) Big Data C) knowledge management systems D) locally concentrated workforces

D

25) Which of the following is LEAST related to data/information visualization? A) information graphics B) scientific visualization C) statistical graphics D) graphic artwork

D

26) A data mining study is specific to addressing a well-defined business task, and different business tasks require A) general organizational data. B) general industry data. C) general economic data. D) different sets of data.

D

28) Which of the following is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies? A) MIS B) DSS C) ERP D) BI

D

31) In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse

D

32) Data warehouses provide direct and indirect benefits to organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service

D

32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

D

34) BI applications must be integrated with A) databases. B) legacy systems. C) enterprise systems. D) all of these

D

34) What is the fundamental challenge of dashboard design? A) ensuring that users across the organization have access to it B) ensuring that the organization has the appropriate hardware onsite to support it C) ensuring that the organization has access to the latest Web browsers D) ensuring that the required information is shown clearly on a single screen

D

35) When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down.

D

37) This measure of central tendency is the sum of all the values/observations divided by the number of observations in the data set. A) dispersion B) mode C) median D) arithmetic mean

D

27) Which kind of chart is described as an enhanced version of a scatter plot? A) heat map B) bullet C) pie chart D) bubble chart

D - Section 2.9

24) Oper marts are created when operational data needs to be analyzed A) linearly. B) in a dashboard. C) unidimensionally. D) multidimensionally.

D - Section 3.2 under Operational Data Stores

A visual presentation of critical data for executives to view. It allows executives to see hot spots in seconds and explore the situation.

Dashboard

49) ________ providers focus on providing technology and services aimed toward integrating data from multiple sources.

Data Warehouse

55) ________ analytics help managers understand current events in the organization including causes, trends, and patterns.

Descriptive

54) ________ modeling is a retrieval-based system that supports high-volume query access.

Dimensional

53) With ________, all the data from every corner of the enterprise is collected and integrated into a consistent schema so that every part of the organization has access to the single version of the truth when and where needed.

Enterprise Resource Planning (ERP)

49) ________ is a mechanism that integrates application functionality and shares functionality (rather than data) across systems, thereby enabling flexibility and reuse.

Enterprise application integration (EAI)

50) ________ is a mechanism for pulling data from source systems to satisfy a request for information. It is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

Enterprise information integration (EII)

1) T/F - Computerized support is only used for organizational decisions that are responses to external pressures, not for taking advantage of opportunities.

FALSE

1) T/F - In the opening case, police detectives used data mining to identify possible new areas of inquiry.

FALSE

10) T/F - Major commercial business intelligence (BI) products and services were well established in the early 1970s.

FALSE

11) T/F - Statistics and data mining both look for data sets that are as large as possible.

FALSE

11) T/F - When telling a story during a presentation, it is best to avoid describing hurdles that your character must overcome, to avoid souring the mood.

FALSE

15) Data source reliability means that data are correct and are a good match for the analytics problem.

FALSE Data content accuracy means that data are correct and are a good match for the analytics problem. Data source reliability refers to the originality and appropriateness of the storage medium where the data is obtained

11) T/F - Information systems that support such transactions as ATM withdrawals, bank deposits, and cash register scans at the grocery store represent transaction processing, a critical branch of BI.

FALSE - BI is not transaction processing but analytics of transactions (Section 1.4)

14) T/F - BI represents a bold new paradigm in which the company's business strategy must be aligned to its business intelligence analysis initiatives.

FALSE - BI must be aligned to business strategy (Section 1.4)

15) T/F - With the balanced scorecard approach, the entire focus is on measuring and managing specific financial goals based on the organization's strategy.

FALSE - Four perspectives: financial, customer, internal business process, and learning and growth (Section 3.11)

3) Data is the contextualization of information, that is, information set in context.

FALSE - Information is the contextualization of data (Section 2.7)

12) T/F - Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture, a data mart bus architecture with conformed dimensions.

FALSE - Inmon = hub-and-spoke; Kimball = DM bus (section 3.4)

16) T/F - OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items.

FALSE - OLAP (section 3.6)

19) T/F - Due to industry consolidation, the analytics ecosystem consists of only a handful of players across several functional areas.

FALSE - Section 1.8

20) T/F - Data generation is a precursor, and is not included in the analytics ecosystem.

FALSE - Section 1.8 Figure 1.13

6) T/F - Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.

FALSE - Section 3.2

8) T/F - Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

FALSE - Section 3.4

1) T/F - The BPM development cycle is essentially a one-shot process where the requirement is to get it right the first time.

FALSE - Section 3.9

19) T/F - Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.

FALSE - Section 4.6

12) T/F - Visual analytics is aimed at answering, "What is happening?" and is usually associated with business analytics.

FALSE - Visual analytics answers "why is it happening?" (Section 2.10)

T/F - Nominal data represent the labels of multiple classes used to divide a variable into specific groups.

FALSE - categorical data (section 2.3)

4) T/F - Data warehouses are subsets of data marts.

FALSE - data marts are a subset of data warehouses (section 3.2)

2) T/F - To respond to its market challenges, SiriusXM decided to focus on manufacturing efficiency.

FALSE - data-driven marketing (Section 2.1)

18) T/F - User-initiated navigation of data through disaggregation is referred to as "drill up."

FALSE - drill down (Section 3.6)

17) T/F - In the Dell case study, the largest issue was how to properly spend the online marketing budget.

FALSE - getting online customers to contact sales representatives (Section 4.2)

15) T/F - K-fold cross-validation is also called sliding estimation.

FALSE - rotation estimation (Section 4.5)

14) T/F - In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week.

FALSE - run it more profitably (Case 2.7 in Section 2.11)

13) T/F - Dashboards provide visual displays of important information that is consolidated and arranged across several screens to maintain data order.

FALSE - single screen (Section 2.11)

60) ________ charts are a special case of horizontal bar charts that are used to portray project timelines, project tasks/activity durations, and overlap among the tasks/activities.

Gantt

68) There are several basic information system architectures that can be used for data warehousing. What are they?

Generally speaking, these architectures are commonly called client/server or n-tier architectures, of which two-tier and three-tier architectures are the most common, but sometimes there is simply one tier.

59) The filing system developed by Google to handle Big Data storage challenges is known as the ________ Distributed File System.

Hadoop

Describe the difference between simple and multiple regression.

If the regression equation is built between one response variable and one explanatory variable, then it is called simple regression. Multiple regression is the extension of simple regression where the explanatory variables are more than one.

67) In lessons learned from the Target case, what legal warnings would you give another retailer using data mining for marketing?

If you look at this practice from a legal perspective, you would conclude that Target did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy. There are certain events or concepts that should be off limits or treated extremely cautiously, such as terminal disease, divorce, and bankruptcy.

61) What is the definition of a data warehouse (DW) in simple terms?

In simple terms, a data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization.

60) ________ (also called in-database analytics) refers to the integration of the algorithmic extent of data analytics into data warehouse.

In-database processing

42) ________ statistics is about drawing conclusions about the characteristics of the population.

Inferential

52) The ________ Model, also known as the EDW approach, emphasizes top-down development, employing established database development methodologies and tools, such as entity-relationship diagrams (ERD), and an adjustment of the spiral development approach.

Inmon

53) The ________ Model, also known as the data mart approach, is a "plan big, build small" approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaled-down version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales.

Kimball

46) ________ regression is a very popular, statistically sound, probability-based classification algorithm that employs supervised learning.

Logistic

Metric Management Reports

Many organizations manage business performance through outcome-oriented metrics. For external groups, these are service-level agreements (SLAs). For internal management, they are key performance indicators (KPIs).

60) The programing algorithm developed by Google to handle Big Data computational challenges is known as ________.

MapReduce

49) ________ are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, etc.

Maps

44) ________ describe the structure and meaning of the data, contributing to their effective use.

Metadata

Mehra (2005) indicated that few organizations really understand metadata, and fewer understand how to design and implement a metadata strategy. How would you describe metadata?

Metadata are data about data. Metadata describe the structure of and some meaning about data, thereby contributing to their effective or ineffective use.

44) ________ management reports are used to manage business performance through outcome-oriented metrics in many organizations.

Metric

50) ________ providers focus on bringing all the data stores into an enterprise-wide platform.

Middleware

48) ________ charts or network diagrams show precedence relationships among the project activities/tasks.

PERT

56) ________ analytics help managers understand probable future outcomes.

Predictive

57) ________ analytics help managers make decisions to achieve the best performance in the future.

Prescriptive

42) In ________ oriented data warehousing, operational databases are tuned to handle transactions that update the database.

Product

69) More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for real-time data warehousing (RDW). How would you define real-time data warehousing?

Real-time data warehousing, also known as active data warehousing (ADW), is the process of loading and providing data via the data warehouse as they become available.

59) ________, or "The Extended ASP Model," is a creative way of deploying information system applications where the provider licenses its applications to customers for use as a service on demand (usually over the Internet).

SaaS (software as a service)

59) ________ plots are often used to explore the relationship between two or three variables (in 2-D or 2-D visuals).

Scatter

14) T/F - During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

TRUE

14) T/F - With key performance indicators, driver KPIs have a significant effect on outcome KPIs, but the reverse is not necessarily true.

TRUE

15) T/F - Traditional BI systems use a large volume of static data that has been extracted, cleansed, and loaded into a data warehouse to produce reports and analyses.

TRUE

16) T/F - When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

TRUE

2) T/F - During the early days of analytics, data was often obtained from the domain experts using manual processes to build mathematical or knowledge-based models.

TRUE

2) T/F - The cost of data storage has plummeted recently, making data mining feasible for more firms.

TRUE

4) Data is the main ingredient for any BI, data science, and business analytics initiative.

TRUE - Section 2.2

5) Predictive algorithms generally require a flat file with a target variable, so making data analytics ready for prediction means that data sets must be transformed into a flat-file format and made ready for ingestion into those predictive algorithms.

TRUE - Section 2.2

9) There are basic chart types and specialized chart types. A Gantt chart is a specialized chart type.

TRUE - Section 2.9

2) T/F - The "islands of data" problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization.

TRUE - Section 3.2

5) T/F - One way an operational data store differs from a data warehouse is the recency of their data.

TRUE - Section 3.2

11) T/F - Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them.

TRUE - Section 3.4

17) T/F - The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage.

TRUE - Section 3.7

4) T/F - If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

TRUE - Section 4.2

10) T/F - The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent data marts.

TRUE - section 3.4

66) Describe the difference between descriptive and inferential statistics.

The main difference between descriptive and inferential statistics is the data used in these methods—descriptive statistics is describing the sample data on hand, and inferential statistics is drawing inferences or conclusions about the characteristics of the population.

Balanced scorecard-type reports

This is a method developed by Kaplan and Norton that attempts to present an integrated view of success in an organization. In addition to financial performance, balanced scorecard-type reports also include customer, business process, and learning and growth perspectives.

58) The Google search engine is an example of Big Data in that it has to search and index billions of ________ in fractions of a second for each search.

Web pages

41) Fundamental reasons for investing in BI must be ________ with the company's business strategy.

aligned

52) Data warehouses are intended to work with informational data used for online ________ processing systems.

analytical

43) Organizations using BI systems are typically seeking to ________ the gap between the operational data and strategic objectives has become more pressing.

close

51) The user interface of a BI system is often referred to as a(n) ________.

dashboard

46) In the Dell case study, engineers working closely with marketing, used lean software development strategies and numerous technologies to create a highly scalable, singular ________.

data mart

43) Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

data mining

51) In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing.

deviation

55) Information dashboards enable ________ operations that allow the users to view underlying data sources and obtain more detail.

drill-down/drill-through

48) Different types of players are identified and described in the analytics ________.

ecosystem

51) Performing extensive ________ to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy.

extraction, transformation, and load (ETL)

46) A(n) ________ architecture is used to build a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts.

hub-and-spoke

55) In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.

k-fold cross-validation

54) Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.

knowledge discovery

58) Because of its successful application to retail business problems, association rule mining is commonly called ________.

market-basket analysis

56) With a dashboard, information on sources of the data being presented, the quality and currency of underlying data provide contextual ________ for users.

metadata

53) With dashboards, the layer of information that uses graphical, abstracted data to keep tabs on key performance metrics is the ________ layer.

monitoring

42) There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants.

needs

57) When validating the assumptions of a regression, ________ assumes that the errors of the response variable are normally distributed.

normality

51) Visual analytics is widely regarded as the combination of visualization and ________ analytics.

predictive

43) Due to the ________ expansion of information technology coupled with the need for improved competitiveness in business, there has been an increase in the use of computing power to produce unified reports that join different views of the enterprise in one place.

rapid

45) Most data warehouses are built using ________ database management systems to control and manage the data.

relational

50) Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.

relationship

53) The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.

relationships

41) A(n) ________ is a communication artifact, concerning business matters, prepared with the specific intention of relaying information in a presentable form.

report

57) Given that the size of data warehouses is expanding at an exponential rate, ________ is an important issue.

scalability

56) Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale.

transaction processing

50) Typical charts, graphs, and other visual elements used in visualization-based applications usually involve ________ dimensions.

two

46) A(n) ________ is a major component of a Business Intelligence (BI) system that is often browser based and often presents a portal or dashboard.

user interface

68) List four myths associated with data mining.

• Data mining provides instant, crystal-ball-like predictions. • Data mining is not yet viable for business applications. • Data mining requires a separate, dedicated database. • Only those with advanced degrees can do data mining. • Data mining is only for large firms that have lots of customer data.

67) Briefly describe four major components of the data warehousing process.

• Data sources. Data are sourced from multiple independent operational "legacy" systems and possibly from external data providers (such as the U.S. Census). Data may also come from an OLTP or ERP system. • Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial ETL software. • Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or data marts. • Comprehensive database. Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources. • Metadata. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools. • Middleware tools. Middleware tools enable access to the data warehouse. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools.

61) List five reasons for the growing popularity of data mining in the business world.

• More intense competition at the global scale driven by customers' ever-changing needs and wants in an increasingly saturated marketplace • General recognition of the untapped value hidden in large data sources • Consolidation and integration of database records, which enables a single view of customers, vendors, transactions, etc. • Consolidation of databases and other data repositories into a single location in the form of a data warehouse • The exponential increase in data processing and storage technologies • Significant reduction in the cost of hardware and software for data storage and processing • Movement toward the demassification (conversion of information resources into nonphysical form) of business practices

13) Successful BI is a tool for the information systems department, but is not exposed to the larger organization.

FALSE

13) T/F - In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

FALSE

13) T/F - Properly integrating data from various databases and other disparate sources is a trivial process.

FALSE

16) Demands for instant, on-demand access to dispersed information decrease as firms successfully integrate BI into their operations.

FALSE

17) T/F - The use of dashboards and data visualizations is seldom effective in identifying issues in organizations, as demonstrated by the Silvaris Corporation Case Study.

FALSE

18) T/F - Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

FALSE

19) T/F - Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure.

FALSE

20) T/F - Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software.

FALSE

20) T/F - Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

FALSE

3) T/F - Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

FALSE

4) T/F - Business intelligence (BI) is a specific term that describes architectures and tools only.

FALSE

5) T/F - The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.

FALSE

5) T/F - The growth in hardware, software, and network capacities has had little impact on modern BI innovations.

FALSE

6) The data storage component of a business reporting system builds the various reports and hosts them for, or disseminates them to users. It also provides notification, annotation, collaboration, and other services.

FALSE

7) T/F - Ratio data is a type of categorical data.

FALSE

8) T/F - Decision support system (DSS) and management information system (MIS) have precise definitions agreed to by practitioners.

FALSE

9) T/F - Moving the data into a data warehouse is usually the easiest part of its creation.

FALSE

6) T/F - Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

FALSE - end user (Section 4.2)

9) In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

FALSE - fill the experience gap (Section 4.1)

7) Managing information on operations, customers, internal procedures and employee interactions is the domain of cognitive science.

FALSE - knowledge management (Section 1.2)

3) T/F - Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

FALSE - such as sales, products, or customers (section 3.2)

47) ________ charts are useful in displaying nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends.

Bar

58) ________ charts are effective when you have nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends within your data.

Bar

47) ________ cycle times are now extremely compressed, faster, and more informed across industries.

Business

44) ________ is an umbrella term that combines architectures, tools, databases, analytical tools, applications, and methodologies.

Business intelligence (BI)

20) Which characteristic of data means that all the required data elements are included in the data set? A) data source reliability B) data accessibility C) data richness D) data granularity

C

22) Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

C

23) Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart

C

24) What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations, and it is available to use B) because most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

C

25) A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one-tier architecture. B) two-tier architecture. C) three-tier architecture. D) four-tier architecture.

C

26) The Internet emerged as a new medium for visualization and brought all the following EXCEPT A) worldwide digital distribution of visualization. B) immersive environments for consuming data. C) new forms of computation of business logic. D) new graphics displays through PC displays.

C

27) The need for more versatile reporting than what was available in 1980s era ERP systems led to the development of what type of system? A) management information systems B) relational databases C) executive information systems D) data warehouses

C

28) Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture

C

28) Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration? A) heat map B) bullet C) pie chart D) bubble chart

C

30) Which of the following is NOT an example of transaction processing? A) ATM withdrawal B) bank deposit C) sales report D) cash register scans

C

31) All of the following statements about data mining are true EXCEPT: A) The term is relatively new. B) Its techniques have their roots in traditional statistical analysis and artificial intelligence. C) The ideas behind it are relatively new. D) Intense, global competition make its application more important.

C

31) Online transaction processing (OLTP) systems handle a company's routine ongoing business. In contrast, a data warehouse is typically A) the end result of BI processes and operations. B) a repository of actionable intelligence obtained from a data mart. C) a distinct system that provides storage for data that will be made use of in analysis. D) an integral subsystem of an online analytical processing (OLAP) system.

C

33) All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems.

C

35) What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

C

35) What has caused the growth of the demand for instant, on-demand access to dispersed information? A) the increasing divide between users who focus on the strategic level and those who are more oriented to the tactical level B) the need to create a database infrastructure that is always online and contains all the information from the OLTP systems C) the more pressing need to close the gap between the operational data and strategic objectives D) the fact that BI cannot simply be a technical exercise for the information systems department

C

36) Dashboards can be presented at all the following levels EXCEPT A) the visual dashboard level. B) the static report level. C) the visual cube level. D) the self-service cube level.

C

37) Real-time data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data.

C

38) A large storage location that can hold vast quantities of data (mostly unstructured) in its native/raw format for future/potential analytics consumption is referred to as a(n) A) extended ASP. B) data cloud. C) data lake. D) relational database.

C

38) Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

C

38) What type of analytics seeks to determine what is likely to happen in the future? A) descriptive B) prescriptive C) predictive D) domain

C

39) In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area.

C

40) Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is A) centralized storage creates too many vulnerabilities. B) the "Big" in Big Data necessitates over 10,000 processing nodes. C) the processing power needed for the centralized model would overload a single computer. D) Big Data systems have to match the geographical spread of social media.

C

23) Business applications have moved from transaction processing and monitoring to other activities. Which of the following is NOT one of those activities? A) problem analysis B) solution applications C) data monitoring D) mobile access

C - Section 1.2

45) ________ was proposed in the mid-1990s by a European consortium of companies to serve as a nonproprietary standard methodology for data mining.

CRISP-DM

67) Describe categorical and nominal data.

Categorical data represent the labels of multiple classes used to divide a variable into specific groups. Examples of categorical variables include race, sex, age group, and educational level. Nominal data contain measurements of simple codes assigned to objects as labels, which are not measurements. For example, the variable marital status can be generally categorized as (1) single, (2) married, and (3) divorced.

62) List 3 common data mining myths and realities.

1) Myth: Data mining provides instant, crystal-ball-like predictions. Reality: Data mining is a multistep process that requires deliberate, proactive design and use. 2) Myth: Data mining is not yet viable for mainstream business applications. Reality: The current state of the art is ready to go for almost any business type and/or size. 3) Myth: Data mining requires a separate, dedicated database. Reality: Because of the advances in database technology, a dedicated database is not required. 4) Myth: Only those with advanced degrees can do data mining. Reality: Newer Web-based tools enable managers of all educational levels to do data mining. 5) Myth: Data mining is only for large firms that have lots of customer data. Reality: If the data accurately reflect the business or its customers, any company can use data mining.

62) What are the four major components of a Business Intelligence (BI) system?

1. A data warehouse, with its source data 2. Business analytics, a collection of tools for manipulating, mining, and analyzing the data in the data warehouse 3. Business performance management (BPM) for monitoring and analyzing performance 4. A user interface (e.g., a dashboard)

66) Six Sigma rests on a simple performance improvement model known as DMAIC. What are the steps involved?

1. Define. Define the goals, objectives, and boundaries of the improvement activity. At the top level, the goals are the strategic objectives of the company. At lower levels—department or project levels—the goals are focused on specific operational processes. 2. Measure. Measure the existing system. Establish quantitative measures that will yield statistically valid data. The data can be used to monitor progress toward the goals defined in the previous step. 3. Analyze. Analyze the system to identify ways to eliminate the gap between the current performance of the system or process and the desired goal. 4. Improve. Initiate actions to eliminate the gap by finding ways to do things better, cheaper, or faster. Use project management and other planning tools to implement the new approach. 5. Control. Institutionalize the improved system by modifying compensation and incentive systems, policies, procedures, manufacturing resource planning, budgets, operation instructions, or other management systems.

65) What are the most important assumptions in linear regression?

1. Linearity. This assumption states that the relationship between the response variable and the explanatory variables is linear. That is, the expected value of the response variable is a straightline function of each explanatory variable, while holding all other explanatory variables fixed. Also, the slope of the line does not depend on the values of the other variables. It also implies that the effects of different explanatory variables on the expected value of the response variable are additive in nature. 2. Independence (of errors). This assumption states that the errors of the response variable are uncorrelated with each other. This independence of the errors is weaker than actual statistical independence, which is a stronger condition and is often not needed for linear regression analysis. 3. Normality (of errors). This assumption states that the errors of the response variable are normally distributed. That is, they are supposed to be totally random and should not represent any nonrandom patterns. 4. Constant variance (of errors). This assumption, also called homoscedasticity, states that the response variables have the same variance in their error, regardless of the values of the explanatory variables. In practice this assumption is invalid if the response variable varies over a wide enough range/scale. 5. Multicollinearity. This assumption states that the explanatory variables are not correlated (i.e., do not replicate the same but provide a different perspective of the information needed for the model). Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables presented to the model (e.g., if the same explanatory variable is mistakenly included in the model twice, one with a slight transformation of the same variable). A correlation-based data assessment usually catches this error.

64) What are the four processes that define a closed-loop BPM cycle?

1. Strategize: This is the process of identifying and stating the organization's mission, vision, and objectives, and developing plans (at different levels of granularity—strategic, tactical and operational) to achieve these objectives. 2. Plan: When operational managers know and understand the what (i.e., the organizational objectives and goals), they will be able to come up with the how (i.e., detailed operational and financial plans). Operational and financial plans answer two questions: What tactics and initiatives will be pursued to meet the performance targets established by the strategic plan? What are the expected financial results of executing the tactics? 3. Monitor/Analyze: When the operational and financial plans are underway, it is imperative that the performance of the organization be monitored. A comprehensive framework for monitoring performance should address two key issues: what to monitor and how to monitor. 4. Act and Adjust: What do we need to do differently? Whether a company is interested in growing its business or simply improving its operations, virtually all strategies depend on new projects—creating new products, entering new markets, acquiring new customers or businesses, or streamlining some processes. The final part of this loop is taking action and adjusting current actions based on analysis of problems and opportunities.

41) In the Influence Health case, the company was able to evaluate over ________ million records in only two days.

195

22) Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile.

A

23) All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

A

25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

A

29) Clustering partitions a collection of things into segments whose members share A) similar characteristics. B) dissimilar characteristics. C) similar collection methods. D) dissimilar collection methods.

A

30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

A

30) Which type of question does visual analytics seeks to answer? A) Why is it happening? B) What happened yesterday? C) What is happening today? D) When did it happen?

A

30) ________ is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases. A) Enterprise information integration (EII) B) Enterprise application integration (EAI) C) Extraction, transformation, and load (ETL) D) None of these

A

32) The very design that makes an OLTP system efficient for transaction processing makes it inefficient for A) end-user ad hoc reports, queries, and analysis. B) transaction processing systems that constantly update operational databases. C) the collection of reputable sources of intelligence. D) transactions such as ATM withdrawals, where we need to reduce a bank balance accordingly.

A

33) What is the management feature of a dashboard? A) operational data that identify what actions to take to resolve a problem B) summarized dimensional data to analyze the root cause of problems C) summarized dimensional data to monitor key performance metrics D) graphical, abstracted data to monitor key performance metrics

A

36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

A

37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

A

38) This measure of dispersion is calculated by simply taking the square root of the variations. A) standard deviation B) range C) variance D) arithmetic mean

A

Describe cluster analysis and some of its applications.

Cluster analysis is an exploratory data analysis tool for solving classification problems. The objective is to sort cases (e.g., people, things, events) into groups, or clusters, so that the degree of association is strong among members of the same cluster and weak among members of different clusters. Cluster analysis is an essential data mining method for classifying items, events, or concepts into common groupings called clusters. The method is commonly used in biology, medicine, genetics, social network analysis, anthropology, archaeology, astronomy, character recognition, and even in MIS development. As data mining has increased in popularity, the underlying techniques have been applied to business, especially to marketing. Cluster analysis has been used extensively for fraud detection (both credit card and e-commerce fraud) and market segmentation of customers in contemporary CRM systems.

27) Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture

D

28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A) associations B) visualization C) classification D) clustering

D

34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

D

39) Which of the following statements about Big Data is true? A) Data chunks are stored in different locations on one computer. B) Hadoop is a type of processor used to process Big Data applications. C) MapReduce is a storage filing system. D) Pure Big Data systems do not involve fault tolerance.

D

40) All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology.

D

63) Why is data alone worthless?

Alone, data is worthless because it does not provide business value. To provide business value, it has to be analyzed.

68) How does Amazon.com use predictive analytics to respond to product searches by the customer?

Amazon uses clustering algorithms to segment customers into different clusters to be able to target specific promotions to them. The company also uses association mining techniques to estimate relationships between different purchasing behaviors. That is, if a customer buys one product, what else is the customer likely to purchase? That helps Amazon recommend or promote related products. For example, any product search on Amazon.com results in the retailer also suggesting other similar products that may interest a customer.

26) Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the Web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff

B

29) The competitive imperatives for BI include all of the following EXCEPT A) right information B) right user C) right time D) right place

B

29) Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration

B

29) Which type of visualization tool can be very helpful when a data set contains location data? A) bar chart B) geographic map C) highlight table D) tree map

B

31) When you tell a story in a presentation, all of the following are true EXCEPT A) a story should make sense and order out of a lot of background noise. B) a well-told story should have no need for subsequent discussion. C) stories and their lessons should be easy to remember. D) the outcome and reasons for it should be clear at the end of your story.

B

33) How are enterprise resources planning (ERP) systems related to supply chain management (SCM) systems? A) different terms for the same system B) complementary systems C) mutually exclusive systems D) none of the above; these systems never interface

B

33) Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

B

35) Contextual metadata for a dashboard includes all the following EXCEPT A) whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. B) which operating system is running the dashboard server software. C) whether the dashboard is presenting "fresh" or "stale" information. D) when the data warehouse was last refreshed.

B

36) Today, many vendors offer diversified tools, some of which are completely preprogrammed (called shells). How are these shells utilized? A) They are used for customization of BI solutions. B) All a user needs to do is insert the numbers. C) The shell provides a secure environment for the organizations BI data. D) They host an enterprise data warehouse that can assist in decision making.

B

37) What type of analytics seeks to recognize what is going on as well as the likely forecast and make decisions to achieve the best performance possible? A) descriptive B) prescriptive C) predictive D) domain

B

39) How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability.

B

40) This technique makes no a priori assumption of whether one variable is dependent on the other(s) and is not concerned with the relationship between variables; instead it gives an estimate on the degree of association between the variables. A) regression B) correlation C) means test D) multiple regression

B

Informal, judgmental knowledge of an application area that constitutes the rules of good judgment in the field. ________________ also encompasses the knowledge of how to solve problems efficiently and effectively, how to plan steps in solving a complex problem, how to improve performance, and so forth. A. Online analytical processing (OLAP) B. Heuristics C. Decision support systems D. Predictive analytics

B - Section 1.3/Glossary

36) What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process

B - Section 3.12

63) List and briefly describe the six steps of the CRISP-DM data mining process.

Step 1: Business Understanding — The key element of any data mining study is to know what the study is for. Answering such a question begins with a thorough understanding of the managerial need for new knowledge and an explicit specification of the business objective regarding the study to be conducted. Step 2: Data Understanding — A data mining study is specific to addressing a well-defined business task, and different business tasks require different sets of data. Following the business understanding, the main activity of the data mining process is to identify the relevant data from many available databases. Step 3: Data Preparation — The purpose of data preparation (or more commonly called data preprocessing) is to take the data identified in the previous step and prepare it for analysis by data mining methods. Compared to the other steps in CRISP-DM, data preprocessing consumes the most time and effort; most believe that this step accounts for roughly 80 percent of the total time spent on a data mining project Step 4: Model Building — Here, various modeling techniques are selected and applied to an already prepared data set in order to address the specific business need. The model-building step also encompasses the assessment and comparative analysis of the various models built. Step 5: Testing and Evaluation — In step 5, the developed models are assessed and evaluated for their accuracy and generality. This step assesses the degree to which the selected model (or models) meets the business objectives and, if so, to what extent (i.e., do more models need to be developed and assessed). Step 6: Deployment — Depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. In many cases, it is the customer, not the data analyst, who carries out the deployment steps.

1) T/F - One of SiriusXM's challenges was tracking potential customers when cars were sold.

TRUE

10) T/F - In data mining, classification models help in prediction.

TRUE

10) Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.

TRUE

12) T/F - Many business users in the 1980s referred to their mainframes as "the black hole," because all the information went into it, but little ever came back and ad hoc real-time querying was virtually impossible.

TRUE

12) T/F - Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

TRUE

17) T/F - Structured data is what data mining algorithms use and can be classified as categorical or numeric.

TRUE

18) T/F - Interval data are variables that can be measured on interval scales.

TRUE

18) T/F - The use of statistics in baseball by the Oakland Athletics, as described in the Moneyball case study, is an example of the effectiveness of prescriptive analytics.

TRUE

19) Descriptive statistics is all about describing the sample data on hand.

TRUE

3) T/F - Computer applications have moved from transaction processing and monitoring activities to problem analysis and solution applications.

TRUE

6) T/F - Managing data warehouses requires special methods, including parallel computing and/or Hadoop/Spark.

TRUE

7) In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received.

TRUE

7) T/F - Without middleware, different BI programs cannot easily connect to the data warehouse.

TRUE

8) Google Maps has set new standards for data visualization with its intuitive Web mapping software.

TRUE

8) T/F - Converting continuous valued numerical variables to ranges and categories is referred to as discretization.

TRUE

9) T/F - In the 2000s, the DW-driven DSSs began to be called BI systems.

TRUE

16) T/F - Data accessibility means that the data are easily and readily obtainable.

TRUE - Section 2.2

64) What is the intent of the analysis of data that is stored in a data warehouse?

The intent of the analysis is to give management the ability to analyze data for insights into the business, and thus provide tactical or operational decision support whereby, for example, line personnel can make quicker and/or more informed decisions.

66) In the data mining in Hollywood case study, how successful were the models in predicting the success or failure of a Hollywood movie?

The researchers claim that these prediction results are better than any reported in the published literature for this problem domain. Fusion classification methods attained up to 56.07% accuracy in correctly classifying movies and 90.75% accuracy in classifying movies within one category of their actual category. The SVM classification method attained up to 55.49% accuracy in correctly classifying movies and 85.55% accuracy in classifying movies within one category of their actual category.

64) Describe the role of the simple split in estimating the accuracy of classification models.

The simple split (or holdout or test sample estimation) partitions the data into two mutually exclusive subsets called a training set and a test set (or holdout set). It is common to designate two-thirds of the data as the training set and the remaining one-third as the test set. The training set is used by the inducer (model builder), and the built classifier is then tested on the test set. An exception to this rule occurs when the classifier is an artificial neural network. In this case, the data is partitioned into three mutually exclusive subsets: training, validation, and testing.

Dashboard-Type Reports

This report presents a range of different performance indicators on one page, like a dashboard in a car. Typically, there is a set of predefined reports with static elements and fixed structure, but customization of the dashboard is allowed through widgets, views, and set targets for various metrics.

54) ________ series forecasting is the use of mathematical modeling to predict future values of the variable of interest based on previously observed values.

Time

57) As described in the Influence Health case study, customers are more often ________ services from a variety of healthcare service providers before selecting one.

comparing

52) Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.

data preprocessing

43) The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

data stores

45) A(n) ________ is a major component of a Business Intelligence (BI) system that holds source data.

data warehouse

58) The role responsible for successful administration and management of a data warehouse is the ________, who should be familiar with high-performance software, hardware, and networking technologies, and also possesses solid business insight.

data warehouse administrator (DWA)

44) Data are often buried deep within very large ________, which sometimes contain data from several years.

databases

60) One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

de-identification

56) The basic idea behind a(n) ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.

decision tree

55) A(n) ________ data mart is a subset that is created directly from the data warehouse..

dependent

47) Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.

extracted

47) The ________ data warehouse architecture involves integrating disparate systems and analytical resources from multiple sources to meet changing needs or business conditions.

federated

48) While prediction is largely experience and opinion based, ________ is data and model based.

forecasting

54) As the number of potential BI applications increases, the need to justify and prioritize them arises. This is not an easy task due to the large number of ________ benefits.

intangible

48) Data ________ comprises data access, data federation, and change capture.

integration

42) Software monitors referred to as ________ can be placed on a separate server in the network and use event- and process-based approaches to measure and monitor operational processes.

intelligent agents

45) When validating the assumptions of a regression, ________ assumes that the relationship between the response variable and the explanatory variables are linear.

linearity

41) A(n) ________ data store (ODS) provides a fairly recent form of customer information file.

operational

52) Dashboards present visual displays of important information that are consolidated and arranged on a single ________.

screen

49) Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement.

statistics

69) Describe and define Big Data. Why is a search engine a Big Data application?

• Big Data is data that cannot be stored in a single storage unit. Big Data typically refers to data that is arriving in many different forms, be they structured, unstructured, or in a stream. Major sources of such data are clickstreams from Web sites, postings on social media sites such as Facebook, or data from traffic, sensors, or weather. • A Web search engine such as Google needs to search and index billions of Web pages in order to give you relevant search results in a fraction of a second. Although this is not done in real time, generating an index of all the Web pages on the Internet is not an easy task.

65) Briefly describe five techniques (or algorithms) that are used for classification modeling.

• Decision tree analysis. Decision tree analysis (a machine-learning technique) is arguably the most popular classification technique in the data mining arena. • Statistical analysis. Statistical techniques were the primary classification algorithm for many years until the emergence of machine-learning techniques. Statistical classification techniques include logistic regression and discriminant analysis. • Neural networks. These are among the most popular machine-learning techniques that can be used for classification-type problems. • Case-based reasoning. This approach uses historical cases to recognize commonalities in order to assign a new case into the most probable category. • Bayesian classifiers. This approach uses probability theory to build classification models based on the past occurrences that are capable of placing a new instance into a most probable class (or category). • Genetic algorithms. This approach uses the analogy of natural evolution to build directedsearch-based mechanisms to classify data samples. • Rough sets. This method takes into account the partial membership of class labels to predefined categories in building models (collection of rules) for classification problems.

67) List and describe three levels or categories of analytics that are most often viewed as sequential and independent, but also occasionally seen as overlapping.

• Descriptive or reporting analytics refers to knowing what is happening in the organization and understanding some underlying trends and causes of such occurrences. • Predictive analytics aims to determine what is likely to happen in the future. This analysis is based on statistical techniques as well as other more recently developed techniques that fall under the general category of data mining. • Prescriptive analytics recognizes what is going on as well as the likely forecast and makes decisions to achieve the best performance possible.

70) What storage system and processing algorithm were developed by Google for Big Data?

• Google developed and released as an Apache project the Hadoop Distributed File System (HDFS) for storing large amounts of data in a distributed way. • Google developed and released as an Apache project the MapReduce algorithm for pushing computation to the data, instead of pushing data to a computing node.

62) List five types of specialized charts and graphs.

• Histograms • Gantt charts • PERT charts • Geographic maps • Bullets • Heat maps • Highlight tables • Tree maps

61) List four possible analytics applications in the retail value chain.

• Inventory Optimization • Price Elasticity • Market Basket Analysis • Shopper Insight • Customer Churn Analysis • Channel Analysis • New Store Analysis • Store Layout • Video Analytics

61) List and describe the three major categories of business reports.

• Metric management reports. Many organizations manage business performance through outcome-oriented metrics. For external groups, these are service-level agreements (SLAs). For internal management, they are key performance indicators (KPIs). • Dashboard-type reports. This report presents a range of different performance indicators on one page, like a dashboard in a car. Typically, there is a set of predefined reports with static elements and fixed structure, but customization of the dashboard is allowed through widgets, views, and set targets for various metrics. • Balanced scorecard-type reports. This is a method developed by Kaplan and Norton that attempts to present an integrated view of success in an organization. In addition to financial performance, balanced scorecard-type reports also include customer, business process, and learning and growth perspectives.

66) Business applications can be programmed to act on what real-time BI systems discover. Describe two approaches to the implementation of real-time BI.

• One approach to real-time BI uses the DW model of traditional BI systems. In this case, products from innovative BI platform providers provide a service-oriented, near-real-time solution that populates the DW much faster than the typical nightly extract/transfer/load (ETL) batch update does. • A second approach, commonly called business activity management (BAM), is adopted by pure play BAM and or hybrid BAM-middleware providers (such as Savvion, Iteration Software, Vitria, webMethods, Quantive, Tibco, or Vineyard Software). It bypasses the DW entirely and uses Web services or other monitoring means to discover key business events. These software monitors (or intelligent agents) can be placed on a separate server in the network or on the transactional application databases themselves, and they can use event- and process-based approaches to proactively and intelligently measure and monitor operational processes.

65) Describe the three major subsets of the Analytics Focused Software Developers portion of the Analytics Ecosystem.

• Reporting/Descriptive Analytics — Includes tools is enabled by and available from the Middleware industry players and unique capabilities offered by focused providers. • Predictive Analytics — a rapidly growing area that includes a variety of statistical packages. • Prescriptive Analytics — Software providers in this category offer modeling tools and algorithms for optimization of operations usually called management science/operations research software.

70) List six common data mining mistakes.

• Selecting the wrong problem for data mining • Ignoring what your sponsor thinks data mining is and what it really can and cannot do • Leaving insufficient time for data preparation • Looking only at aggregated results and not at individual records • Being sloppy about keeping track of the data mining procedure and results • Ignoring suspicious findings and quickly moving on • Running mining algorithms repeatedly and blindly • Believing everything you are told about the data • Believing everything you are told about your own data mining analysis • Measuring your results differently from the way your sponsor measures them

70) Mention briefly some of the recently popularized concepts and technologies that will play a significant role in defining the future of data warehousing.

• Sourcing (mechanisms for acquisition of data from diverse and dispersed sources): o Web, social media, and Big Data o Open source software o SaaS (software as a service) o Cloud computing • Infrastructure (architectural—hardware and software—enhancements): o Columnar (a new way to store and access data in the database) o Real-time data warehousing o Data warehouse appliances (all-in-one solutions to DW) o Data management technologies and practices o In-database processing technology (putting the algorithms where the data is) o o In-memory storage technology (moving the data in the memory for faster processing) o o New database management systems o Advanced analytics

62) A common way of introducing data warehousing is to refer to its fundamental characteristics. Describe three characteristics of data warehousing.

• Subject oriented. Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support. • Integrated. Integration is closely related to subject orientation. Data warehouses must place data from different sources into a consistent format. To do so, they must deal with naming conflicts and discrepancies among units of measure. A data warehouse is presumed to be totally integrated. • Time variant (time series). A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems). They detect trends, deviations, and long-term relationships for forecasting and comparisons, leading to decision making. Every data warehouse has a temporal quality. Time is the one important dimension that all data warehouses must support. Data for analysis from multiple sources contains multiple time points (e.g., daily, weekly, monthly views). • Nonvolatile. After data are entered into a data warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data. • Web based. Data warehouses are typically designed to provide an efficient computing environment for Web-based applications. • Relational/multidimensional. A data warehouse uses either a relational structure or a multidimensional structure. A recent survey on multidimensional structures can be found in Romero and Abelló (2009). • Client/server. A data warehouse uses the client/server architecture to provide easy access for end users. • Real time. Newer data warehouses provide real-time, or active, data-access and analysis capabilities (see Basu, 2003; and Bonde and Kuckuk, 2004). • Include metadata. A data warehouse contains metadata (data about data) about how the data are organized and how to effectively use them.

64) List the five most common functions of business reports.

• To ensure that all departments are functioning properly • To provide information • To provide the results of an analysis • To persuade others to act • To create an organizational memory (as part of a knowledge management system


संबंधित स्टडी सेट्स

The Great Gatsby Chapter 9 Questions

View Set

11; The British Invasion: The Beatles

View Set

NSSW NEXT: Service Work and Quality Check

View Set

Government Final - Chapter 12 Metropolitics

View Set

4 - E Commercial Package Policies

View Set

Herbivores, Carnivores, Omnivores

View Set

Tutorial 06: Working with Tables and Columns

View Set