Midterm 2LI

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

55) In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.

-k-fold cross-validation

54) Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.

-knowledge discovery

58) Because of its successful application to retail business problems, association rule mining is commonly called ________.

-market-basket analysis

1) When HP approaches problem-solving, the first step in solving business problems is building a model that enables decision makers to develop a good understanding of the problem.

Answer: FALSE

10) Qualitative elements of a problem cannot be incorporated into formal decision models, so one can only seek to minimize their impact.

Answer: FALSE

11) Since the business environment involves considerable uncertainty, a manager cannot use modeling to estimate the risks resulting from specific actions.

Answer: FALSE

13) The ETL process in data warehousing usually takes up a small portion of the time in a data-centric project.

Answer: FALSE

14) Generating alternatives manually is often necessary in the model-building process. The best option for the decision makers is to generate as many of these alternatives as is conceivable.

Answer: FALSE

15) Generally speaking, people intuitively estimate risk quite accurately.

Answer: FALSE

15) Large companies, especially those with revenue upwards of $500 million consistently reap substantial cost savings through the use of hosted data warehouses.

Answer: FALSE

17) Business intelligence systems typically support solving a certain problem or evaluate an opportunity, while decision support systems monitor situations and identify problems and/or opportunities, using analytic methods.

Answer: FALSE

18) Artificial intelligence-based DSS fall into this category of document-driven DSS.

Answer: FALSE

2) In a decision making environment, continuous change always validates the assumptions of the decision makers.

Answer: FALSE

3) The most important feature of management support systems is the computational efficiency involved in making a decision.

Answer: FALSE

5) Single decision makers rarely face decisions with multiple objectives in organizations and so are not the focus of data analytics tools.

Answer: FALSE

6) The design phase of decision making is where the decision maker examines reality and identifies and defines the problem.

Answer: FALSE

7) Only after the failed implementation of a decision can the decision maker return a prior stage of decision making.

Answer: FALSE

9) Moving the data into a data warehouse is usually the easiest part of its creation.

Answer: FALSE

12) A normative model examines all the possible alternatives in order to prove that the one selected is the best.

Answer: TRUE

13) Since a descriptive model checks the performance of the system for only a subset of all possible alternatives, there is no guarantee that a selected alternative will be optimal.

Answer: TRUE

14) In the Starwood Hotels case, up-to-date data and faster reporting helped hotel managers better manage their occupancy rates.

Answer: TRUE

16) A data warehouse can support the intelligence phase of decision making by continuously monitoring both internal and external information, looking for early signs of problems and opportunities through a Web-based enterprise information portal or dashboard.

Answer: TRUE

19) The DSS component that includes the financial, statistical, management science, or other quantitative models is called the model management subsystem.

Answer: TRUE

20) Knowledge-based management subsystems provide intelligence to augment the decision maker's own intelligence.

Answer: TRUE

4) Web-based decision support systems can provide support to both individuals and groups that act in a decision-making capacity.

Answer: TRUE

5) One way an operational data store differs from a data warehouse is the recency of their data.

Answer: TRUE

7) Without middleware, different BI programs cannot easily connect to the data warehouse.

Answer: TRUE

8) Web-based collaboration tools (e.g., GSS) can assist in multiple stages of decision making, not just the intelligence phase.

Answer: TRUE

9) Uncovering the existence of a problem can be achieved through monitoring and analyzing of the organization's productivity level. The derived measurements of productivity are based on real data

Answer: TRUE

Information systems that support such transactions as ATM withdrawals, bank deposits, and cash register scans at the grocery store represent transaction processing, a critical branch of BI. T/F

False

Leave-one-out is a special of k-fold cross-validation

False

OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items. T/F

False

One of the four components of BI systems, business performance management, is a collection of source data in the data warehouse. T/F

False

Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses. T/F

False

Pushing programming out to distributed data is achieved solely by using the Hadoop Distributed File System or HDFS. T/F

False

Ratio data is a type of categorical data

False

T/F Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

False

T/F Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing.

False

T/F Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales

False

T/F Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

False

T/F Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

False

T/F Decentralization, the need for specialized skills, and immediacy of output are all attributes of Web publishing when compared to industrial publishing.

False

T/F Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments.

False

T/F In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.

False

T/F In the Cabela's case study, the SAS/Teradata solution enabled the direct marketer to better identify likely customers and market to them based mostly on external data sources.

False

T/F In the Memphis Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

False

T/F In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

False

T/F In the patent analysis case study, text mining of thousands of patents held by the firm and its competitors helped improve competitive intelligence, but was of little use in identifying complementary products.

False

T/F Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

False

T/F Ratio data is a type of categorical data.

False

T/F Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.

False

T/F Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.

False

T/F Statistics and data mining both look for data sets that are as large as possible.

False

T/F Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.

False

T/F The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.

False

T/F When training a data mining model, the testing dataset is always larger than the training dataset.

False

The complexity of today's business environment creates many new challenges for organizations, such as global competition, but creates few new opportunities in return. T/F

False

The success of BI is assured not because of which personnel would be the most likely to use it, but as a result of pervasive adoption across the organization. T/F

False

The term intelligence in a BI context is used to describe clandestine operations dedicated to stealing corporate secrets, in the manner of the government's CIA and other covert agencies. T/F

False

The two critical partnerships required for BI governance are (a) a partnership between functional area users and/or product/service area employees, and (b) a partnership between representatives of the marketing and vendor sides. T/F

False

The use of dashboards and data visualizations is seldom effective in finding efficiencies in organizations, as demonstrated by the Seattle Children's Hospital Case Study. T/F

False

Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones. T/F

False

Which component of a reporting system contains steps detailing how recorded transactions are converted into metrics, scorecards, and dashboards? A) data supply B) business logic C) extract, transform and load D) assurance

B

Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests? A) use of the web by users as a front-end B) parallel processing C) Microsoft Windows D) a larger IT staff

B

Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration

Pie chart

Many business users in the 1980s referred to their mainframes as "the black hole," because all the information went into it, but little ever came back and ad hoc real-time querying was virtually impossible. T/F

True

T/F Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

True

T/F Categorization and clustering of documents during text mining differ only in the preselection of categories.

True

T/F Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.

True

T/F Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.

True

T/F During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

True

T/F Generally, making a search engine more efficient makes it less effective.

True

T/F If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

True

T/F In data mining, classification models help in prediction.

True

T/F In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.

True

T/F In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

True

T/F In the 2degrees case study, the main effectiveness of the new analytics system was in dissuading potential churners from leaving the company.

True

T/F In the Hong Kong government case study, reporting time was the main benefit of using SAS Business Analytics to generate reports.

True

T/F In the financial services firm case study, text analysis for associate-customer interactions were completely automated and could detect whether they met the company's standards.

True

T/F Interval data is a type of numerical data.

True

T/F Regional accents present challenges for natural language processing.

True

T/F The cost of data storage has plummeted recently, making data mining feasible for more firms

True

T/F The number of users of free/open source data mining software now exceeds that of users of commercial software versions.

True

T/F Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

True

T/F Web site visitors who critique and create content are more engaged than those who join networks and spectate.

True

T/F When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

True

The "islands of data" problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization. T/F

True

The access to data and ability to manipulate data (frequently including real-time data) are key elements of business intelligence (BI) systems. T/F

True

The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage. T/F

True

The overwhelming majority of competitive actions taken by businesses today feature computerized information system support. T/F

True

The use of statistics in baseball by the Oakland Athletics, as described in the Moneyball case study, is an example of the effectiveness of prescriptive analytics. T/F

True

Traditional BI systems use a large volume of static data that has been extracted, cleansed, and loaded into a data warehouse to produce reports and analyses. T/F

True

In addition to deploying business intelligence (BI) systems, companies may also perform other actions to counter business pressures, such as improving customer service and entering business alliances. T/F

True

In data mining, classification models help in getting prediction patterns

True

Performance dashboards enable ________ operations that allow the users to view underlying data sources and obtain more detail.

drill-down/drill-through

The miner is often a(n)

end user.

With dashboards, the layer of information that uses graphical, abstracted data to keep tabs on key performance metrics is the ________ layer.

monitoring

In a survey, people are asked to indicate their political party affiliation - democrat, republican, or independent, the data gathered can be best described as

nominal data

potentially useful

results should lead to some business benefit.

A data warehouse contains one fact table and multiple dimension tables. Each dimension table is connected directly to the fact table. This means we represent data in the data warehouse using __:

star schema.

A well-designed data warehouse means that user requirements do not have to change as business needs change. T/F

False

Almost all BI applications are constructed with shells provided by an outsourcing provider who may themselves create a custom solution for a vendor or work with another client. T/F

False

BI represents a bold new paradigm in which the company's business strategy must be aligned to its business intelligence analysis initiatives. T/F

False

Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software. T/F

False

Computerized support is only used for organizational decisions that are responses to external pressures, not for taking advantage of opportunities. T/F

False

Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure. T/F

False

33) What class of DSS incorporates simulation and optimization? A) model-driven DSS B) data-driven DSS C) communications-driven/Group DSS D) knowledge-driven DSS

A) model-driven DSS

________ charts or network diagrams show precedence relationships among the project activities/tasks.

PERT

23) All of the following are challenges associated with natural language processing EXCEPT A) dividing up a text into individual words in English. B) understanding the context in which something is said. C) distinguishing between words that have more than one meaning. D) recognizing typographical or grammatical errors in texts.

- A

25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

- A

30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

- A

36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

- A

37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

- A

All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

- A

33) Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

- B

40) Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

- B

21) In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT A) massive parallelism to enable simultaneous consideration of multiple hypotheses. B) an underlying confidence subsystem that ranks and integrates answers. C) a core engine that could operate seamlessly in another domain without changes. D) integration of shallow and deep knowledge.

- C

24) What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations and it is available to use B) because the most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

- C

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

- C

31) All of the following statements about data mining are true EXCEPT A) understanding the business goal is critical. B) understanding the data, e.g., the relevant variables, is critical to success. C) building the model takes the most time and effort. D) data is typically preprocessed and/or cleaned before use.

- C

35) What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

- C

38) Third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

- C

45) ________ represent the labels of multiple classes used to divide a variable into specific groups, examples of which include race, sex, age group, and educational level.

- Categorical data

60) ________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.

- Cohesion

57) ________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.

- Conversion

46) At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

- Corpus

26) The data field "salary" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

- D

28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A) associations B) visualization C) classification D) clustering

- D

29) The data mining algorithm type used for classification somewhat resembling the biological neural networks in the human brain is A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

- D

32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

- D

34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

- D

41) IBM's Watson utilizes a massively parallel, text mining—focused, probabilistic evidence-based computational architecture called ________.

- DeepQA

55) ________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

- Off-site

42) ________, also called homonyms, are syntactically identical words with different meanings.

- Polysemes

59) ________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

- Propinquity

44) ________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

- Sentiment analysis

48) ________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.

- Voice of the customer (VOC)

43) Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

- data mining

52) Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.

- data preprocessing

44) Data are often buried deep within very large ________, which sometimes contain data from several years.

- databases

51) In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing.

- deviation

47) Because the term-document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.

- dimensionality

47) Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.

- extracted

48) While prediction is largely experience and opinion based, ________ is data and model based.

- forecasting

52) A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

- hub

50) Web pages contain both unstructured information and ________, which are connections to other Web pages.

- hyperlinks

45) In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.

- message feature mining

53) The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.

- relationships

42) There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants.

- needs

49) When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.

- polarity

41) In the opening vignette, Cabela's uses SAS data mining tools to create ________ models to optimize customer selection for all customer contacts.

- predictive

56) A ________ Web site contains links that send traffic directly to your Web site.

- referral

50) Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.

- relationship

53) A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

- search

49) Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement.

- statistics

43) When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

- word sense disambiguation

22) In text mining, tokenizing is the process of A) categorizing a block of text in a sentence. B) reducing multiple words to their base or root. C) transforming the term-by-document matrix to a manageable size. D) creating new branches or stems of recorded paragraphs.

-A

25) In the research literature case study, the researchers analyzing academic papers extracted information from which source? A) the paper abstract B) the paper keywords C) the main body of the paper D) the paper references

-A

28) What do voice of the market (VOM) applications of sentiment analysis do? A) They examine customer sentiment at the aggregate level. B) They examine employee sentiment in the organization. C) They examine the stock market for trends. D) They examine the "market of ideas" in politics

-A

30) In text analysis, what is a lexicon? A) a catalog of words, their synonyms, and their meanings B) a catalog of customers, their words, and phrase C) a catalog of letters, words, phrases and sentences D) a catalog of customers, products, words, and phrase

-A

59) The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.

-Apriori algorithm

26) In sentiment analysis, which of the following is an implicit opinion? A) The hotel we stayed in was terrible. B) The customer service I got for my TV was laughable. C) The cruise we went on last summer was a disaster. D) Our new mayor is great for the city.

-B

31) What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? A) medium- to large-sized documents B) small- to medium-sized documents C) large-sized documents D) collections of documents

-B

32) What does Web content mining involve? A) analyzing the universal resource locator in Web pages B) analyzing the unstructured content of Web pages C) analyzing the pattern of visits to a Web site D) analyzing the PageRank and other metadata of a Web page

-B

34) Search engine optimization (SEO) is a means by which A) Web site developers can negotiate better deals for paid ads. B) Web site developers can increase Web site search rankings. C) Web site developers index their Web sites for search engines. D) Web site developers optimize the artistic features of their Web sites.

-B

38) Which of the following statements about Web site conversion statistics is FALSE? A) Web site visitors can be classed as either new or returning. B) Visitors who begin a purchase on most Web sites must complete it. C) The conversion rate is the number of people who take action divided by the number of visitors. D) Analyzing exit rates can tell you why visitors left your Web site.

-B

24) What data discovery process, whereby objects are categorized into predetermined groups, is used in text mining? A) clustering B) association C) classification D) trend analysis

-C

35) What are the two main types of Web analytics? A) old-school and new-school Web analytics B) Bing and Google Web analytics C) off-site and on-site Web analytics D) data-based and subjective Web analytics

-C

36) Web site usability may be rated poor if A) the average number of page views on your Web site is large. B) the time spent on your Web site is long. C) Web site visitors download few of your offered PDFs and videos. D) users fail to click on all pages equally.

-C

39) What is one major way in which Web-based social media differs from traditional publishing media? A) Most Web-based media are operated by the government and large firms. B) They use different languages of publication. C) They have different costs to own and operate. D) Web-based media have a narrower range of quality.

-C

40) What does advanced analytics for social media do? A) It helps identify your followers. B) It identifies links between groups. C) It examines the content of online conversations. D) It identifies the biggest sources of influence online.

-C

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

-C

29) How is objectivity handled in sentiment analysis? A) It is ignored because it does not appear in customer sentiment. B) It is incorporated as a type of sentiment. C) It is clarified with the customer who expressed it. D) It is identified and removed as facts are not sentiment.

-D

33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A) preprocessing the documents. B) document analysis. C) creating the term-by-document matrix. D) parsing the documents.

-D

37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A) the hardware your Web site is running on. B) the type of Web browser being used by your Web site visitors. C) most of your Web site visitors' wants and needs. D) how well visitors understand your products.

-D

In the Cabela's case study, what types of models helped the company understand the value of customers, using a five-point scale? A) reporting and association models B) simulation and geographical models C) simulation and regression models D) clustering and association models

-D

60) One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

-de-identification

56) The basic idea behind a ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.

-decision tree

The knowledge extracted from web usage mining determines how to:

1. Increase the customer value 2. Improve the Web site 3. Better the data collection 4. Determine the customer's buying preferences

What are the three layers of information of a BI dashboard?

1. Monitoring--KPIs 2. Analysis--root cause of problems 3. Management--detailed operational data that identify what action to take to resolve problem

5 most popular analyses/applications of text mining

1. Summarization 2. Classification (text categorization) 3. Clustering (natural groupings of text) 4. Association/Concept linking 5. Trend Analysis

3 main areas of web mining

1. web content mining 2. web structure mining 3. web usage mining

Typical charts, graphs, and other visual elements used in visualization-based applications usually involve ________ dimensions.

2

Suppose the data warehouse contains following data. Based on the data, what's the total dollars sold of shoes at Ed's store in May 2008?:

2000.

You have a survey question that asks: "What do you think the likelihood is that the FSU football team will win the ACC championship?" If you have survey results from 100 people and the average response is 40% with a standard deviation of 5. Which of the following can you approximate from the results 95% of the respondents think that there is a 30% - 50% chance that the FSU football team will win the ACC championship. 70% of the respondents think that there is a 30% - 50% chance that the FSU football team will win the ACC championship. 100% of the respondents think that there is a 30% - 50% chance that the FSU football team will win the ACC championship. 0% of the respondents think that there is a 30% - 50% chance that the FSU football team will win the ACC championship.

95% of the respondents think that there is a 30% - 50% chance that the FSU football team will win the ACC championship.

Business intelligence (BI) can be characterized as a transformation of A) data to information to decisions to actions. B) Big Data to data to information to decisions. C) actions to decisions to feedback to information. D) data to processing to information to actions.

A

In which stage of extraction, transformation, and load (ETL) into a data warehouse are data aggregated? A) transformation B) extraction C) load D) cleanse

A

The very design that makes an OLTP system efficient for transaction processing makes it inefficient for what? A) end-user ad hoc reports, queries, and analysis B) transaction processing systems that constantly update operational databases C) the collection of reputable sources of intelligence D) transactions such as ATM withdrawals, where we need to reduce a bank balance accordingly

A

Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are A) subject-oriented and nonvolatile. B) product-oriented and nonvolatile. C) product-oriented and volatile. D) subject-oriented and volatile.

A

What is the management feature of a dashboard? A) operational data that identify what actions to take to resolve a problem B) summarized dimensional data to analyze the root cause of problems C) summarized dimensional data to monitor key performance metrics D) graphical, abstracted data to monitor key performance metrics

A

When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure? A) star schema B) snowflake schema C) relational schema D) dimensional schema

A

Which type of question does visual analytics seeks to answer? A) Why did it happen? B) What happened yesterday? C) What is happening today? D) When did it happen?

A

Why is the customer perspective important in the balanced scorecard methodology? A) because dissatisfied customers will eventually hurt the bottom line B) because customers should always be included in any design methodology C) because customers understand best how the firm's internal processes should work D) because companies need customer input into the design of the balanced scorecard

A

What is a pattern?

A mathematical (numeric and/or symbolic) relationship among data items

Which of the following statements is true about snowflakes schema?:

A snowflake schema is a format od dimensional model.

35) The fact that many organizations share many similar problems means that in sourcing a DSS, it is often wiser to acquire a(n) A) ready-made DSS. B) custom-made DSS. C) offshored DSS. D) consultant-developed DSS.

A) ready-made DSS.

34) When a DSS is built, used successfully and integrated into the company's business processes, it was most likely built for a(n) A) recurrent decision. B) one-off decision. C) unimportant decision. D) ambiguous decision.

A) recurrent decision.

24) A search for alternatives occurs in which phase of the decision making/action model? A) the design phase B) the intelligence phase C) the choice phase D) the implementation phase

A) the design phase

steps 1-3 of CRISP-DM process

Account for 85% of total project time

1) In the Isle of Capri case, the only capability added by the new software was increased processing speed of processing reports.

Answer: FALSE

Contextual metadata for a dashboard includes all the following EXCEPT A) whether any high-value transactions that would skew the overall trends were rejected as a part of the loading process. B) which operating system is running the dashboard server software. C) whether the dashboard is presenting "fresh" or "stale" information. D) when the data warehouse was last refreshed.

B

How are descriptive analytics methods different from the other two types? A) They answer "what-if?" queries, not "how many?" queries. B) They answer "what-is?" queries, not "what will be?" queries. C) They answer "what to do?" queries, not "what-if?" queries. D) They answer "what will be?" queries, not "what to do?" queries.

B

How does the use of cloud computing affect the scalability of a data warehouse? A) Cloud computing vendors bring as much hardware as needed to users' offices. B) Hardware resources are dynamically allocated as use increases. C) Cloud vendors are mostly based overseas where the cost of labor is low. D) Cloud computing has little effect on a data warehouse's scalability.

B

If a company's strategy is properly aligned with DW and BI initiatives, and if the company's IS organization can be made capable of playing its role in such a project, and if the requisite user community is in place and has the proper motivation, then A) it is no longer necessary to start BI within the company. B) it is wise to start BI and establish a BI Competency Center (BICC) within the company. C) the organization is ready for the introduction of new data-generating technologies, such as radio-frequency identification (RFID). D) business leaders are required to document their business processes and to sign off on the legitimacy of the information they rely on.

B

In answering the question "Which customers are most likely to click on my online ads and purchase my goods?" you are most likely to use which of the following analytic applications? A) customer profitability B) propensity to buy C) customer attrition D) channel optimization

B

In the Magpie Sensing case study, the automated collection of temperature and humidity data on shipped goods helped with various types of analytics. Which of the following is an example of predictive analytics? A) real time reports of the shipment's temperature B) warning of an open shipment seal C) location of the shipment D) optimal temperature setting

B

Kaplan and Norton developed a report that presents an integrated view of success in the organization called A) metric management reports. B) balanced scorecard-type reports. C) dashboard-type reports. D) visual reports.

B

Once a data warehouse is in place, the general process of intelligence creation begins with A) end-user examinations of decision-making impacts. B) identifying and prioritizing specific BI projects. C) estimating the cost-benefit ratio of the ROI. D) establishing the critical partnerships required for BI governance.

B

Today, many vendors offer diversified tools, some of which are completely preprogrammed (called s). How are these shells utilized? A) They are used for customization of BI solutions. B) All a user needs to do is insert the numbers. C) The shell provides a secure environment for the organization's BI data. D) They host an enterprise data warehouse that can assist in decision making.

B

What can the BI users in an organization help guide and direct? A) how to implement and deploy a BI initiative that can be lengthy, expensive, and failure prone B) how the DW is structured and the types of BI tools and other supporting software that are needed C) how to decompose the planning and execution into business, organization, functionality, and infrastructure components D) how the DW is structured and the costs and the appreciation for different classes of potential users

B

What is Six Sigma? A) a letter in the Greek alphabet that statisticians use to measure process variability B) a methodology aimed at reducing the number of defects in a business process C) a methodology aimed at reducing the amount of variability in a business process D) a methodology aimed at measuring the amount of variability in a business process

B

When you tell a story in a presentation, all of the following are true EXCEPT A) a story should make sense and order out of a lot of background noise. B) a well-told story should have no need for subsequent discussion. C) stories and their lessons should be easy to remember. D) the outcome and reasons for it should be clear at the end of your story.

B

Which approach to data warehouse integration focuses more on sharing process functionality than data across systems? A) extraction, transformation, and load B) enterprise application integration C) enterprise information integration D) enterprise function integration

B

Which of the following online analytical processing (OLAP) technologies does NOT require the precomputation and storage of information? A) MOLAP B) ROLAP C) HOLAP D) SQL

B

Which type of visualization tool can be very helpful when a data set contains location data? A) bar chart B) geographic map C) highlight table D) tree map

B

21) The HP Case illustrates that after analytics are chosen to solve a problem, building a new decision model from scratch or purchasing one may not always be the best approach. Why is that? A) Decision models should never be purchased, only developed in house. B) A related tool requiring slight modification may already exist. C) CIOs are more likely to allocate funds to new development. D) Analytic models work better when they are built from scratch or purchased.

B) A related tool requiring slight modification may already exist.

39) What type of user interface has been recognized as an effective DSS GUI because it is familiar, user friendly, and a gateway to almost all sources of necessary information and data? A) ASP.net B) Web browsers C) visual basic interfaces D) mainframe interfaces

B) Web browsers

23) All of the following statements about decision style are true EXCEPT A) autocratic styles are authority-based. B) decision styles are consistent among top managers. C) heuristic styles can also be democratic. D) decision styles may vary among lower-level managers.

B) decision styles are consistent among top managers.

31) All of the following statements about the decision implementation phases are true EXCEPT A) implementation is every bit as important as the decision itself. B) employees need only the decisions from the CEO, not the rationale. C) ERP, CRP, and BPM tools can all help track decision implementation. D) ES and KMS can help in training and support for decision implementation.

B) employees need only the decisions from the CEO, not the rationale.

27) What form of decision theory assumes that decision makers are rational beings who always seek to strictly maximize economic goals? A) the theory of bounded rationality B) normative decision theory C) satisficing decision theory D) human optimal decision theory

B) normative decision theory

What is Business Intelligence (BI)? What are the four major components of a business Intelligence (BI) system.:

BI is an umbrella term that combines architectures, databases, analytical tools, applications, and methodologies. BI's major objective is to enable easy access to data (and models) to provide business managers with the ability to conduct analysis. 4 major components of BI are data warehouse, Business analytics, Business performance management(BPM), and user interface.

________ charts are useful in displaying nominal data or numerical data that splits nicely into different categories so you can quickly see comparative results and trends.

Bar

Statistical Data Variable Type: A variable that contains the values of either Yes or No would best be categorized as which of the following variable types? Nominal Binary Discrete Ratio

Binary

Which kind of chart is described as an enhanced variant of a scatter plot

Bubble chart

A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a A) one tier architecture. B) two tier architecture. C) three tier architecture. D) four tier architecture.

C

Active data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is A) country of (data) origin. B) nature of the data. C) speed of data transfer. D) source of the data.

C

All of the following are benefits of hosted data warehouses EXCEPT A) smaller upfront investment. B) better quality hardware. C) greater control of data. D) frees up in-house systems.

C

All of the following statements about balanced scorecards and dashboards are true EXCEPT A) scorecards are less preferred at operational and tactical levels. B) dashboards would be the preferred choice to monitor production quality. C) scorecards are best for real-time tracking of a marketing campaign. D) scorecards are preferred for tracking the achievement of strategic goals.

C

Benefits of the latest visual analytics tools, such as SAS Visual Analytics, include all of the following EXCEPT A) mobile platforms such as the iPhone are supported by these products. B) it is easier to spot useful patterns and trends in the data. C) they explore massive amounts of data in hours, not days. D) there is less demand on IT departments for reports.

C

Big Data often involves a form of distributed storage and processing using Hadoop and MapReduce. One reason for this is A) centralized storage creates too many vulnerabilities. B) the "Big" in Big Data necessitates over 10,000 processing nodes. C) the processing power needed for the centralized model would overload a single computer. D) Big Data systems have to match the geographical spread of social media.

C

Dashboards can be presented at all the following levels EXCEPT A) the visual dashboard level. B) the static report level. C) the visual cube level. D) the self-service cube level.

C

For those executives who do not have the time to go through lengthy reports, the best alternative is the A) last page of the report. B) raw data that informed the report. C) executive summary. D) charts in the report.

C

In answering the question "Which customers are likely to be using fake credit cards?" you are most likely to use which of the following analytic applications? A) channel optimization B) customer segmentation C) fraud detection D) customer profitability

C

In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area.

C

In the Whirlpool case study, the company sought to better understand information coming from which source? A) customer transaction data B) delivery information C) customer e-mails D) goods moving through the internal supply chain

C

Online transaction processing (OLTP) systems handle a company's routine ongoing business. In contrast, a data warehouse is typically A) the end result of BI processes and operations. B) a repository of actionable intelligence obtained from a data mart. C) a distinct system that provides storage for data that will be made use of in analysis. D) an integral subsystem of an online analytical processing (OLAP) system.

C

Prescriptive BI capabilities are viewed as more powerful than predictive ones for all the following reasons EXCEPT A) prescriptive BI gives actual guidance as to actions. B) understanding the likelihood of certain events often leaves unclear remedies. C) only prescriptive BI capabilities have monetary value to top-level managers. D) prescriptive models generally build on (with some overlap) predictive ones.

C

The Internet emerged as a new medium for visualization and brought all the following EXCEPT A) worldwide digital distribution of visualization. B) immersive environments for consuming data. C) new forms of computation of business logic. D) new graphics displays through PC displays.

C

What has caused the growth of the demand for instant, on-demand access to dispersed information? A) the increasing divide between users who focus on the strategic level and those who are more oriented to the tactical level B) the need to create a database infrastructure that is always online and contains all the information from the OLTP systems C) the more pressing need to close the gap between the operational data and strategic objectives D) the fact that BI cannot simply be a technical exercise for the information systems department

C

When middles look across an organization to ensure that project priorities reflect the needs of the entire business, what is their main concern? A) that their proprietary BI methods are protected from industrial espionage B) that additional information available through an enterprise data warehouse should assist in decision making C) that a project does not just serve to sub-optimize one area over others D) that return on investment (ROI) and total cost of ownership justify the cost—benefit ratio

C

Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture

C

Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates? A) sectional data mart B) public data mart C) independent data mart D) volatile data mart

C

Which of the following is NOT an example that falls within the four major categories of business environment factors for today's organizations? A) globalization B) increased pool of customers C) fewer government regulations D) increased competition

C

Which of the following statements is more descriptive of active data warehouses in contrast with traditional data warehouses? A) strategic decisions whose impacts are hard to measure B) detailed data available for strategic use only C) large numbers of users, including operational staffs D) restrictive reporting with daily and weekly data currency

C

Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration? A) heat map B) bullet C) pie chart D) bubble chart

C

37) The model management subsystem provides the system's analytical capabilities and appropriate software management. Which of the following is NOT an element of the model management subsystem? A) model base B) MBMS C) DBMS D) model execution, integration, and command processor

C) DBMS

36) The software that manages the DSS database and enables relevant data to be accessed by DSS application programs is called A) KWS. B) ERP. C) DBMS. D) CRM.

C) DBMS.

25) All of the following are benefits of using models for decision support EXCEPT A) it is easier to manipulate a model than a real system. B) you can find out probable outcomes of an action before actually taking it. C) using well-designed models always guarantees you success in implementation. D) the cost of a model is usually much lower than manipulating the system in implementation.

C) using well-designed models always guarantees you success in implementation.

22) Groupthink in a decision-making environment occurs when A) group members all use the same analytic tools without having a choice. B) group members accept the same timeframe for problem solving without complaining. C) group members all accept a course of action without thinking for themselves. D) group members are all working together for the firm's success.

C) group members all accept a course of action without thinking for themselves.

28) When an Accounts Payable department improves their information system resulting in faster payments to vendors, without the Accounts Receivable Department doing the same, leading to a cash flow crunch, what can we say happened in decision-theoretic terms? A) optimization B) profit minimization C) suboptimization D) cash flow problems

C) suboptimization

Which of the following is an "Unsupervised learning"

Clustering

cluster analysis

Creates groups that have *maximum* similarity among members within each group and *minimum* similarity among members across the groups.

All of the following are true about external reports between businesses and the government EXCEPT A) they can include tax and compliance reporting. B) they can be filed nationally or internationally. C) they are standardized for the most part to reduce the regulatory burden. D) their primary focus is government.

D

All of the following are true about in-database processing technology EXCEPT A) it pushes the algorithms to where the data is. B) it makes the response to queries much faster than conventional databases. C) it is often used for apps like credit card fraud detection and investment risk management. D) it is the same as in-memory storage technology.

D

All of the following statements about metadata are true EXCEPT A) metadata gives context to reported data. B) there may be ethical issues involved in the creation of metadata. C) metadata helps to describe the meaning and structure of data. D) for most organizations, data warehouse metadata are an unnecessary expense.

D

Data warehouses provide direct and indirect benefits to using organizations. Which of the following is an indirect benefit of data warehouses? A) better and more timely information B) extensive new analyses performed by users C) simplified access to data D) improved customer service

D

In the Magpie Sensing case study, the automated collection of temperature and humidity data on shipped goods helped with various types of analytics. Which of the following is an example of prescriptive analytics? A) real time reports of the shipment's temperature B) warning of an open shipment seal C) location of the shipment D) optimal temperature setting

D

In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected? A) transformation B) extraction C) load D) cleanse

D

What is the fundamental challenge of dashboard design? A) ensuring that users across the organization have access to it B) ensuring that the organization has the appropriate hardware onsite to support it C) ensuring that the organization has access to the latest web browsers D) ensuring that the required information is shown clearly on a single screen

D

Organizations counter the pressures they experience in their business environments in multiple ways. Which of the following is NOT an effective way to counter these pressures? A) reactive actions B) anticipative actions C) adaptive actions D) retroactive actions

D

The "single version of the truth" embodied in a data warehouse such as Capri Casinos' means all of the following EXCEPT A) decision makers get to see the same results to queries. B) decision makers have the same data available to support their decisions. C) decision makers get to use more dependable data for their decisions. D) decision makers have unfettered access to all data in the warehouse.

D

When Sabre developed their Enterprise Data Warehouse, they chose to use near-real time updating of their database. The main reason they did so was A) to provide a 360 degree view of the organization. B) to aggregate performance metrics in an understandable way. C) to be able to assess internal operations. D) to provide up-to-date executive insights.

D

When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is A) dice. B) slice. C) roll-up. D) drill down.

D

Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses? A) independent data marts architecture B) centralized data warehouse architecture C) hub-and-spoke data warehouse architecture D) federated architecture

D

Which kind of chart is described as an enhanced variant of a scatter plot? A) heat map B) bullet C) pie chart D) bubble chart

D

Which of the following is LEAST related to data/information visualization? A) information graphics B) scientific visualization C) statistical graphics D) graphic artwork

D

Which of the following statements about Big Data is true? A) Data chunks are stored in different locations on one computer. B) Hadoop is a type of processor used to process Big Data applications. C) MapReduce is a storage filing system. D) Pure Big Data systems do not involve fault tolerance.

D

Why is a performance management system superior to a performance measurement system? A) because performance measurement systems are only in their infancy B) because measurement automatically leads to problem solution C) because performance management systems cost more D) because measurement alone has little use without action

D

32) For DSS, why are semistructured or unstructured decisions the main focus of support? A) There are many more unstructured and semistructured decisions than structured in organizations. B) MIS staff prefer to work on solving unstructured and semistructured decisions. C) Unstructured and semistructured decisions are the easiest to solve. D) They include human judgment, which is incorporated into DSS.

D) They include human judgment, which is incorporated into DSS.

30) The Web can play a significant role in making large amounts of information available to decision makers. Decision makers must be careful that this glut of information does not A) increase their enthusiasm for data available on the web. B) take on the same credibility of internally-generated data. C) take on the same role as human intuition. D) detract from the quality and speed of decision making.

D) detract from the quality and speed of decision making.

38) While Microsoft Excel can be an efficient tool for developing a DSS, compared to using a programming language like C++, a shortcoming of Excel is A) it cannot be used effectively for small or medium sized problems. B) Excel is not widely understood compared to a language like C++. C) it is not widely available for purchase. D) errors can creep into formulas somewhat easily.

D) errors can creep into formulas somewhat easily.

29) All of the following statements about risk in decision making are correct EXCEPT A) all business decisions incorporate an element of risk. B) decision makers frequently measure risk and uncertainty incorrectly. C) methodologies are available for handling extreme uncertainty. D) most decision makers are pessimistic about decision outcomes.

D) most decision makers are pessimistic about decision outcomes.

26) In the design phase of decision making, selecting a principle of choice or criteria means that A) if an objective model is used with hard data, all decision makers will make the same choice. B) risk acceptability is a subjective concept and plays little part in modeling. C) using well-designed models guarantees you success in real life. D) optimality is not the only criterion for acceptable solutions.

D) optimality is not the only criterion for acceptable solutions.

40) The user communicates with and commands the DSS through the user interface subsystem. Researchers assert that some of the unique contributions of DSS are derived from A) the Web browser. B) the user being considered part of the system. C) some DSS user interfaces utilizing natural-language input (i.e., text in a human language). D) the intensive interaction between the computer and the decision maker.

D) the intensive interaction between the computer and the decision maker.

What are the most common myths about data mining?

Data mining provides instant, crystal-ball predictions

Why is it difficult to detect deception?

Deception detection is difficult and if deception detection is limited to only text, then the problem is even more difficult.

Statistical Data Variable Type: A variable that contains a countable number of distinct values would best be categorized as which of the following variable types? Ordinal Discrete Ratio Interval

Discrete

What is a patent?

Exclusive rights granted by a *country* to an inventor for a limited period of time in exchange for a disclosure of an invention

In data warehousing, the ETL process is defined as__:

Extract, Transform, Load.

Bill Immon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture:

False.

Business Intelligence (BI) is a specific term that describes analytical tools only:

False.

Data warehouse are subsets of data marts:

False.

Subject oriented databases for data warehousing are organized by detailed subjects such as disk drive, computers, and networks:

False.

What recent factors have increased the popularity of data mining?

General recognition of the untapped value hidden in large data sources.

Which type of visualization tool can be very helpful when a data set contains location data

Geographic map

Which of the following is a common way of visualizing the frequency distribution of data points over a range of possible values? Quantitative Easing Chart Histogram Skewness chart Scatterplot

Histogram

There is a web-based survey that asks you, "On a rating of 1(hated it) to 5(loved it), how much did you like the movie." This value is stored in your database and you need to categorize the statistical variable type. Which of the following variable types would be best? Ordinal Discrete Interval Ratio

Interval

________ are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, etc.

Maps

Which of the following measures of central location would be best to use when the Skewness is approximately zero? Mean Median Mode 2nd Quartile

Mean

________ management reports are used to manage business performance through outcome-oriented metrics in many organizations.

Metric

How did Infinity P&C improve customer service with data mining?

One out of five claims is fraud. Used data mining and predictive analytics to "fast-track" 4 and close cases quickly. Improved customer service, combat fraud, and increased profit (higher ROI)

The full form of OLAP is:

Online Analytical Processing.

Which of the following is NOT a type of data warehouse?:

Operational database.

A data warehouse is which of the following?:

Organized around important subject areas.

Which of the following is not true about performance management system and performance measurement system

Performance management system encompasses performance management system

Types of models Cabela's used

Prediction, clustering (segmentation) & association models

Cabela's used best-of-breed including ___ and ____.

SAS & Teradata

Which of the following is not a open source tool

SAS Analytics

Which of the following is NOT an example of Online transaction processing (OLTP) system?:

Sales trend report.

3 steps in text mining process

Step 1: Establish the corpus: Collect all relevant unstructured data, Digitize, standardize the collection, Place the collection in a common place Step 2: Create the Term-by-Document Matrix (dimensionality) Step 3: Extract patterns/knowledge using analysis

Which of the following is NOT a qualitative data type? Conversations Surveys with numerical answers Magazine articles Media broadcasts

Surveys with numerical answers

data mining

The *nontrivial* process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data stored in structured databases

What is patent analysis?

The use of analytical techniques to extract valuable knowledge from patent databases.

"Revenue" is an Example of lagging performance indicator

True

Actionable intelligence is the primary goal of modern-day Business Intelligence (BI) systems vs. historical reporting that characterized Management Information Systems (MIS). T/F

True

Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them. T/F

True

Data warehouse and BI initiatives typically follow a process similar to that used in military intelligence initiatives. T/F

True

If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining"

True

One way a data warehouse differs from an operational database is that a data warehouse contains historical data:

True.

The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent data marts:

True.

Volume, velocity, and variety of data characterize the Big Data paradigm:

True.

What is Watson? What is special about it?

Watson is a question answering (QA) computer system developed by an IBM Research team. What makes it special is that it is able to compete at the human champion level in real time on the TV quiz show, Jeopardy!

sentiment analysis

a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources

What type of analytics help did Cabela's get from their efforts?

a. Analyze high-volume sales transactions, market research, and demographic data b. SAS for prediction, clustering, and association models c. Scoring customers on a five-star system d. Optimize customer selection for all customer contacts

4 major types of DM patterns

a. Associations b. Predictions c. Cluster (segmentation) d. Sequential (or time series) relationships

[Cabela's] What are the top challenges for multi-channel retailers?

a. Constant changes in retail industry b. Understanding customers needs, wants, likes, & dislikes c. Create single view of the customer d. Increase in time to prepare and analyze the growing volume and complexity of data

Apriori Algorithm

a. Finds subsets that are common to at least a minimum number of the item sets. b. Uses a bottom-up approach c. Widely used for data mining

What are the sources of data that retailers such as Cabela's use for their data mining projects?

a. Large & information-rich transaction and customer data b. Web Usage Mining to track clickstream patterns of customers shopping online

benefits of Customer Relationship Management (CRM)

a. Maximize return on marketing campaigns b. Improve customer retention (churn analysis) c. Maximize customer value (cross-selling, up-selling)

benefits of text mining in email

a. Spam filtering b. Email prioritization and categorization c. Automatic response generation

What does it mean to have "a single view of the customer"? How can it be accomplished?

a. To understand the customer across all retail channels: store, TV, catalog & website b. To better focus marketing efforts and drive increased sales

For a column in your dataset, your data analysis tool is telling you that the standard deviation is zero. What does this say about the data in that column? all values are zero all values are the same all values are different that statistics are worthless

all values are the same

Market Segmentation

an analysis that aids in dividing customers into groups based upon demographics so that you can target those groups with different advertising campaigns.

decision trees

analysis procedure which classifies observations into distinct groups based upon the values of predictor/input variables

Corpus (corpora) means:

body of knowledge (the data)

Ratio data

business example: salary data

A(n) ________ is a communication artifact, concerning business matters, prepared with the specific intention of relaying information in a presentable form.

business report

Data mining tools are used to identify customer _______.

buying patterns

Tokenizing is:

categorizing a block of text in a sentence

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes/categories

classification

Travel and Transport created an online BI self-service system that allowed ________ to access information directly.

clients

Business performance management comprises a ________ set of processes that link strategy to execution with the goal of optimizing business performance.

closed-loop

Which broad area of data mining application partitions a collection of object into natural groupings with similar features

clustering

Ordinal data

codes assigned as rank order. Ex: credit score as (1) low, (2) medium, or (3) high

In the patent analysis case study, text mining of thousands of patents held by Kodak and its competitors helped improve:

competitive intelligence and identified complementary products.

scalability

construct a prediction model efficiently given a large amount of data

Striking it rich requires

creative thinking.

In evaluating a classifier, if the area is under the roc curve is 0.5, the classifier is better than the random chance of flipping a coin

false

When training a data mining model, the testing dataset is always larger than the training dataset

false

There are only a few categories of business report: informal, ________, and short.

formal

As a result of implementing the IBM SPSS analytics tools, Infinity P&C has doubled the accuracy of its ________, contributing to a return on investment.

fraud identification

DM extracts and identifies _______ from data.

hidden patterns

Cabela's has a _____ view of their customer across all retail shopping channels.

holistic

Which kind of data warehouse is created separately from the enterprise data warehouse by a department and acquires data directly from ETL process?:

independent data mart.

The ________ perspective of the organization suggested by the balanced scorecard focuses on business processes and how well they are running.

internal business process

another name for data mining

knowledge discovery

In the Delta Lloyd Group case study, the ________ is the stage of the reporting process in which consolidated figures are cited, formatted, and described to form the final text of the report.

last mile

Which one of the following is not one of the balanced scorecards four generic perspectives

marketing and advertising

Data mining tools use ________ for extracting hidden patterns for predictive purposes.

mathematical techniques

Interval data

measured on interval scales. Ex: temperature on the Celsius scale

With a dashboard, information on sources of the data being presented, the quality and currency of underlying data provide contextual ________ for users.

metadata

CRISP-DM

most comprehensive, common, and standardized data mining process

Predictions problems where the variables have ______ are most accurately defined as regressions.

numeric values

robustness

overcome noisy data to make somewhat accurate predictions

A strategically aligned metric is also known as a key ________.

performance indicator

Data mining tools use ______ in data to develop mathematical rules for predicting outcomes for future observations.

patterns

Visual analytics is widely regarded as the combination of visualization and ________ analytics.

predictive

What type of analytics seeks to determine what is likely to happen in the future?:

predictive.

novel

previously unknown patterns are discovered.

Web mining (or Web data mining) is the ______ of discovering intrinsic relationships from Web data (textual, linkage, or usage)

process

CRISP-DM process is most comprehensive, highly ____ and _____.

repetitive; experimental

Dashboards present visual displays of important information that are consolidated and arranged on a single ________.

screen

80% of customers open a checking account first, then open a saving account within a year this is an example of

sequential relationship

Nominal data

simple codes assigned as labels. Ex: marital status as S-Single, M-Married, or D-Divorced

Data is the most critical ingredient for data mining which may include

soft/unstructured data.

51) Web ________ are used to automatically read through the contents of Web sites.

spiders

In the Mace case study, the IBM Cognos software enabled the rapid creation of integrated reports across 60 countries, replacing a large and complex ________.

spreadsheet

In the Blastrac case study, Tableau analytics software was used to replace massive ________ that were loaded with data from multiple ERP systems.

spreadsheets

Which of the following tools can not be used to implement a data warehouse

tableau

In estimating the accuracy of data mining (or other) classification models, precision is

the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives

An OLAP cube has measures and dimensions. Which of the following is an example of a measure?:

total sales.

In the Saudi Telecom company case study, information ________ software allowed managers to see trends and correct issues before they became problems.

visualization

List and describe the three major categories of business reports.

∙ Metric management reports--manage business performance through outcome-oriented metrics. ∙ Dashboard-type reports--graphical representation of performance indicators ∙ Balanced scorecard type reports--attempts to present an integrated view of success in an organization; includes financial and nonfinancial (customer, business process, and learning and growth perspectives)


Kaugnay na mga set ng pag-aaral

UNIT 2 - Chapter 9: Eating Disorders and Seep-Wake Disorders

View Set

EatRightPrep Simulated RD Test 2 T

View Set

CHEM: THERMODYNAMICS-ENTROPHY- ENTHALPY STUDY MODULE

View Set