BI - Self Assessment - Chapter 4
Question 4 Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? Answers: A. proprietary organizational methodologies B. SEMMA C. CRISP-DM D. KDD Process
C. CRISP-DM
Question 9 In the Target case study, why did Target send a teen maternity ads? Answers: A. Target was sending ads to all women in a particular neighborhood. B. Target was using a special promotion that targeted all teens in her geographical area. C. Target's analytic model suggested she was pregnant based on her buying habits. D. Target's analytic model confused her with an older woman with a similar name.
C. Target's analytic model suggested she was pregnant based on her buying habits.
Question 37 The data mining algorithm type used for classification somewhat resembling the biological neural networks in the human brain is Answers: A. cluster analysis. B. decision trees. C. artificial neural networks. D. association rule mining.
C. artificial neural networks.
Question 35 All of the following statements about data mining are true EXCEPT Answers: A. understanding the business goal is critical. B. data is typically preprocessed and/or cleaned before use. C. building the model takes the most time and effort. D. understanding the data, e.g., the relevant variables, is critical to success.
C. building the model takes the most time and effort.
Question 14 Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? Answers: A. clustering B. visualization C. classification D. associations
C. classification
Question 17 Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? Answers: A. computer hardware and software B. customer relationship management C. insurance D. retailing and logistics
C. insurance
Question 7 Ratio data is a type of categorical data. Answers: A. True B. False
B. False
Question 2 The data field "ethnic group" can be best described as Answers: A. interval data. B. ordinal data. C. ratio data. D. nominal data.
D. nominal data.
Question 12 Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. Answers: A. True B. False
A. True
Question 15 During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. Answers: A. True B. False
A. True
Question 21 Interval data is a type of numerical data. Answers: A. True B. False
A. True
Question 31 If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining." Answers: A. True B. False
A. True
Question 32 In the 2degrees case study, the main effectiveness of the new analytics system was in dissuading potential churners from leaving the company. Answers: A. True B. False
A. True
Question 36 In data mining, classification models help in prediction. Answers: A. True B. False
A. True
Question 39 When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach. Answers: A. True B. False
A. True
Question 40 The number of users of free/open source data mining software now exceeds that of users of commercial software versions. Answers: A. True B. False
A. True
Question 6 The cost of data storage has plummeted recently, making data mining feasible for more firms. Answers: A. True B. False
A. True
Question 24 In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as Answers: A. association rule mining. B. cluster analysis. C. artificial neural networks. D. decision trees.
A. association rule mining.
Question 13 What is the main reason parallel processing is sometimes used for data mining? Answers: A. because of the massive data amounts and search efforts involved B. because the most of the algorithms used for data mining require it C. because any strategic application requires parallel processing D. because the hardware exists in most organizations and it is available to use
A. because of the massive data amounts and search efforts involved
Question 30 In the Cabela's case study, what types of models helped the company understand the value of customers, using a five-point scale? Answers: A. clustering and association models B. simulation and geographical models C. simulation and regression models D. reporting and association models
A. clustering and association models
Question 1 The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit. Answers: A. True B. False
B. False
Question 10 In the Cabela's case study, the SAS/Teradata solution enabled the direct marketer to better identify likely customers and market to them based mostly on external data sources. Answers: A. True B. False
B. False
Question 11 Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales. Answers: A. True B. False
B. False
Question 16 Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance. Answers: A. True B. False
B. False
Question 19 Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security. Answers: A. True B. False
B. False
Question 23 When training a data mining model, the testing dataset is always larger than the training dataset. Answers: A. True B. False
B. False
Question 25 Statistics and data mining both look for data sets that are as large as possible. Answers: A. True B. False
B. False
Question 33 In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals. Answers: A. True B. False
B. False
Question 5 In the Memphis Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime. Answers: A. True B. False
B. False
Question 8 Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system. Answers: A. True B. False
B. False
Question 28 What does the robustness of a data mining method refer to? Answers: A. its ability to construct a prediction model efficiently given a large amount of data B. its ability to overcome noisy data to make somewhat accurate predictions C. its ability to predict the outcome of a previously unknown data set accurately D. its speed of computation and computational costs in using the mode
B. its ability to overcome noisy data to make somewhat accurate predictions
Question 3 The data field "salary" can be best described as Answers: A. interval data. B. ratio data. C. ordinal data. D. nominal data.
B. ratio data.
Question 22 Which of the following is a data mining myth? Answers: A. Data mining is a multistep process that requires deliberate, proactive design and use. B. Newer Web-based tools enable managers of all educational levels to do data mining. C. The current state-of-the-art is ready to go for almost any business. D. Data mining requires a separate, dedicated database.
D. Data mining requires a separate, dedicated database.
Question 26 Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from Answers: A. developing a philosophy that is data analytics-centric. B. asking the customers what they want. C. collecting data about customers and transactions. D. analyzing the vast data amounts routinely collected.
D. analyzing the vast data amounts routinely collected.
Question 18 Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? Answers: A. classification B. associations C. visualization D. clustering
D. clustering
Question 29 What does the scalability of a data mining method refer to? Answers: A. its speed of computation and computational costs in using the mode B. its ability to predict the outcome of a previously unknown data set accurately C. its ability to overcome noisy data to make somewhat accurate predictions D. its ability to construct a prediction model efficiently given a large amount of data
D. its ability to construct a prediction model efficiently given a large amount of data
Question 34 Prediction problems where the variables have numeric values are most accurately defined as Answers: A. classifications. B. associations. C. computations. D. regressions.
D. regressions.
Question 20 Third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by Answers: A. letting individuals in the data know their data is being accessed. B. asking data users to use the data ethically. C. leaving in identifiers (e.g., name), but changing other variables. D. removing identifiers such as names and social security numbers.
D. removing identifiers such as names and social security numbers.
Question 38 All of the following statements about data mining are true EXCEPT Answers: A. the valid aspect means that the discovered patterns should hold true on new data. B. the novel aspect means that previously unknown patterns are discovered. C. the potentially useful aspect means that results should lead to some business benefit. D. the process aspect means that data mining should be a one-step process to results.
D. the process aspect means that data mining should be a one-step process to results.
Question 27 In estimating the accuracy of data mining (or other) classification models, the true positive rate is Answers: A. the ratio of correctly classified negatives divided by the total negative count. B. the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. C. the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives. D. the ratio of correctly classified positives divided by the total positive count.
D. the ratio of correctly classified positives divided by the total positive count.
