Chapter 6 Quiz (BANK), Big Data Exam 2, ISDS 2001 CH. 4 TEST BANK, Test 2 Chap 4, ISM4402 Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP?

"understanding" the natural human language, converting it into a more computer friendly form (e.g. numbers)

Data and text mining is a promising application of AaaS. What additional capabilities can AaaS bring to the analytic world?

- large-scale optimization - highly complex multicriteria decision problems - distributed simulation models

C) cognitive map.

21) A more general form of an influence diagram is called a(n) A) forecast. B) environmental scan. C) cognitive map. D) static model.

B) influence diagram

22) A(n) ________ is a graphical representation of a model. A) multidimensional analysis B) influence diagram C) OLAP model D) Whisker plot

C) classes

23) Which of the following is NOT a component of a quantitative model? A) result variables B) decision variables C) classes D) parameters

A) mathematical models.

24) Intermediate result variables reflect intermediate outcomes in A) mathematical models. B) flowcharts. C) decision trees. D) ROI calculations.

C) risk.

25) When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.

A) certainty.

26) When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under A) certainty. B) uncertainty. C) risk. D) duress.

B) dynamic

27) A(n) ________ spreadsheet model represents behavior over time. A) static B) dynamic C) looped D) add-in

D) pivot tables.

28) Important spreadsheet features for modeling include all of the following EXCEPT A) what-if analysis. B) goal seeking. C) macros. D) pivot tables.

D) The problem is not bound by constraints.

29) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) A limited quantity of economic resources is available for allocation. B) The resources are used in the production of products or services. C) There are two or more ways in which the resources can be used. D) The problem is not bound by constraints.

C) There is a single way in which the resources can be used.

30) Which of the following is NOT a characteristic displayed by a LP allocation problem? A) Each activity in which the resources are used yields a return in terms of the stated goal. B) The resources are used in the production of products or services. C) There is a single way in which the resources can be used. D) The allocation is usually restricted by several limitations and requirements.

D) All data are unknown with decision making under uncertainty.

31) Which of the following is NOT an assumption used by a LP allocation problem? A) Returns from different allocations can be compared. B) The return from any allocation is independent of other allocations. C) The total return is the sum of the returns yielded by the different activities. D) All data are unknown with decision making under uncertainty.

C) Total returns cannot be compared.

32) Which of the following is NOT an assumption used by a LP allocation problem? A) The resources are to be used in the most economical manner. B) The return from any allocation is independent of other allocations. C) Total returns cannot be compared. D) All data are known with certainty.

A) goal seek

33) This method calculates the values of the inputs necessary to achieve a desired level of an output. A) goal seek B) what-if C) sensitivity D) LP

A) goal seek

34) This method calculates the values of the inputs necessary to generate a zero profit outcome. A) goal seek B) what-if C) sensitivity D) break-even

B) greatest expected value.

35) The most common method for solving a risk analysis problem is to select the alternative with the A) smallest expected value. B) greatest expected value. C) mean expected value. D) median expected value.

C) many alternatives.

36) A decision tree can be cumbersome if there are A) uncertain results. B) few alternatives. C) many alternatives. D) pre-existing decision tables.

C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems.

37) Which of the following is NOT a disadvantage of a simulation? A) An optimal solution cannot be guaranteed, but relatively good ones are generally found. B) Simulation software sometimes requires special skills because of the complexity of the formal solution method. C) Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. D) Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever.

D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.

38) Which of the following is the order of simulation methodology? A) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Implement the results, Evaluate the results. B) Construct the simulation model, Test and validate the model, Define the problem, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. C) Define the problem, Construct the simulation model, Test and validate the model, Evaluate the results, Implement the results, Design the experiment, Conduct the experiment. D) Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.

A) static

39) What type of VIM models display a visual image of the result of one decision alternative at a time? A) static B) dynamic C) DSS D) VIS

D) confidence gap

40) If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur? A) read/write error B) visual distortion C) project failure D) confidence gap

23) All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

A

25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

A

29) Clustering partitions a collection of things into segments whose members share A) similar characteristics. B) dissimilar characteristics. C) similar collection methods. D) dissimilar collection methods.

A

30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

A

36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

A

37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

A

All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

A

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

A

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

A

In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

A

The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

A

What is a data scientist and what does the job involve?

A data scientist is a role associated with Big Data or data science. In a very short time it has become one of the most sought-out roles in the marketplace. Skills include: write coding, storytelling with data, a combination of their business and technical skills to improve current business analytics practices and decisions for new business opportunities.

28) What do voice of the market (VOM) applications of sentiment analysis do?

A) They examine customer sentiment at the aggregate level.

30) In text analysis, what is a lexicon?

A) a catalog of words, their synonyms, and their meanings

37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as

A) association rule mining.

22) In text mining, tokenizing is the process of

A) categorizing a block of text in a sentence.

23) All of the following are challenges associated with natural language processing EXCEPT

A) dividing up a text into individual words in English.

30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?

A) insurance

25) The data field "ethnic group" can be best described as

A) nominal data.

23) All of the following statements about data mining are true EXCEPT

A) the process aspect means that data mining should be a one-step process to results.

36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is

A) the ratio of correctly classified positives divided by the total positive count.

Provide some examples where a sensitivity analysis may be used.

Adding details about sensitive variables or scenarios Obtaining better estimates of sensitive external variables Altering a real-world system to reduce actual sensitivities

Which of the following is NOT an assumption used by a LP allocation problem? Returns from different allocations can be compared. The return from any allocation is independent of other allocations. The total return is the sum of the returns yielded by the different activities. All data are unknown with decision making under uncertainty.

All data are unknown with decision making under uncertainty.

59) The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.

Apriori algorithm

The ________ is the most commonly used algorithm to discover association rules. Given a set of itemsets, the algorithm attempts to find subsets that are common to at least a minimum number of the itemsets.

Apriori algorithm

33) Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

B

40) Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

B

Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

B

Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

B

40) Which of the following is a data mining myth?

B) Data mining requires a separate, dedicated database.

26) In sentiment analysis, which of the following is an implicit opinion?

B) The customer service I got for my TV was laughable.

38) Which of the following statements about Web site conversion statistics is FALSE?

B) Visitors who begin a purchase on most Web sites must complete it.

34) Search engine optimization (SEO) is a means by which

B) Web site developers can increase Web site search rankings.

32) What does Web content mining involve?

B) analyzing the unstructured content of Web pages

33) Prediction problems where the variables have numeric values are most accurately defined as

B) regressions.

31) What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation?

B) small- to medium-sized documents

22) Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

C

24) What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations, and it is available to use B) because most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

C

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

C

31) All of the following statements about data mining are true EXCEPT: A) The term is relatively new. B) Its techniques have their roots in traditional statistical analysis and artificial intelligence. C) The ideas behind it are relatively new. D) Intense, global competition make its application more important.

C

35) What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

C

38) Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

C

39) In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area.

C

All of the following statements about data mining are true EXCEPT A) understanding the business goal is critical. B) understanding the data, e.g., the relevant variables, is critical to success. C) building the model takes the most time and effort. D) data is typically preprocessed and/or cleaned before use.

C

In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area.

C

Third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

C

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

C

What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

C

What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations and it is available to use B) because the most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

C

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

C

40) What does advanced analytics for social media do?

C) It examines the content of online conversations.

39) In the Target case study, why did Target send a teen maternity ads?

C) Target's analytic model suggested she was pregnant based on her buying habits.

39) What is one major way in which Web-based social media differs from traditional publishing media?

C) They have different costs to own and operate.

36) Web site usability may be rated poor if

C) Web site visitors download few of your offered PDFs and videos.

21) In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT

C) a core engine that could operate seamlessly in another domain without changes.

22) Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from

C) analyzing the vast data amounts routinely collected.

24) What is the main reason parallel processing is sometimes used for data mining?

C) because of the massive data amounts and search efforts involved

31) All of the following statements about data mining are true EXCEPT

C) building the model takes the most time and effort.

24) What data discovery process, whereby objects are categorized into predetermined groups, is used in text mining?

C) classification

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?

C) classification

27) In the Whirlpool case study, the company sought to better understand information coming from which source?

C) customer e-mails

35) What does the scalability of a data mining method refer to?

C) its ability to construct a prediction model efficiently given a large amount of data

35) What are the two main types of Web analytics?

C) off-site and on-site Web analytics

38) Third party providers of publicly available datasets protect the anonymity of the individuals in the data set primarily by

C) removing identifiers such as names and social security numbers.

45) ________ represent the labels of multiple classes used to divide a variable into specific groups, examples of which include race, sex, age group, and educational level.

Categorical data

60) ________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.

Cohesion

________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.

Cohesion

57) ________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.

Conversion

________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.

Conversion

46) At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

Corpus

At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

Corpus

Why are companies like IBM shifting to provide more services and consulting? Customers see that significant value can be created with the application of analytics, and need help completing these tasks. They can no longer compete in the software market. New regulations forced them into this market. None of these.

Customers see that significant value can be created with the application of analytics, and need help completing these tasks.

21) In the Influence Health case study, what was the goal of the system? A) locating clinic patients B) understanding follow-up care C) decreasing operational costs D) increasing service use

D

26) A data mining study is specific to addressing a well-defined business task, and different business tasks require A) general organizational data. B) general industry data. C) general economic data. D) different sets of data.

D

28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

D

32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

D

34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

D

In the Cabela's case study, what types of models helped the company understand the value of customers using a 5-point scale? A) reporting and association models B) simulation and geographical models C) simulation and regression models D) clustering and association models

D

The data field "salary" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

D

The data mining algorithm type used for classification somewhat resembling the biological neural networks in the human brain is A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

D

What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

D

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features? A) associations B) visualization C) classification D) clustering

D

Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

D

32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?

D) CRISP-DM

29) How is objectivity handled in sentiment analysis?

D) It is identified and removed as facts are not sentiment.

29) The data mining algorithm type used for classification somewhat resembling the biological neural networks in the human brain is

D) artificial neural networks.

28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

D) clustering

21) In the Cabela's case study, what types of models helped the company understand the value of customers, using a five-point scale?

D) clustering and association models

37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand

D) how well visitors understand your products.

34) What does the robustness of a data mining method refer to?

D) its ability to overcome noisy data to make somewhat accurate predictions

33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called

D) parsing the documents.

The data field "salary" can be best described as

D) ratio data.

This model began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network. SaaS PaaS IaaS DaaS

DaaS

41) IBM's Watson utilizes a massively parallel, text mining—focused, probabilistic evidence-based computational architecture called ________.

DeepQA

IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called ________.

DeepQA

Which of the following is the order of simulation methodology? Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Implement the results, Evaluate the results. Construct the simulation model, Test and validate the model, Define the problem, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results. Define the problem, Construct the simulation model, Test and validate the model, Evaluate the results, Implement the results, Design the experiment, Conduct the experiment. Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.

Define the problem, Construct the simulation model, Test and validate the model, Design the experiment, Conduct the experiment, Evaluate the results, Implement the results.

What are the three categories of social media analytics technologies and what do they do?

Descriptive analytics: using statistical methods to identify activity characteristics and trends, count of users, reviews, followers. Social network analysis: Identifying connections of influence through the friend and fan groups, as well as the biggest sources of influence. Advanced analytics: Using predictive and text analytics to examine the content in online conversations, with the goal of identifying hidden themes, sentiments, and connections.

A decision table shows the relationships of the problem graphically and can handle complex situations in a compact form. T/F

F

A model builder makes predictions and assumptions regarding input data, many of which deal with the assessment of certain futures. T/F

F

All quantitative models are typically made up of six basic components. T/F

F

Business analysis is the monitoring, scanning, and interpretation of collected environmental information. T/F

F

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. T/F

F

Connectivity is not a part of the IoT infrastructure. T/F

F

Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing. T/F

F

Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. T/F

F

For cloud computing to be successful, users must have knowledge and experience in the control of the technology infrastructures. T/F

F

IaaS helps provide faster information, but provides information only to managers in an organization. T/F

F

In decision making under uncertainty, it is assumed that complete knowledge is available. T/F

F

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. T/F

F

In the Great Clips case study, the company uses geospatial data to analyze, among other things, the types of haircuts most popular in different geographic locations. T/F

F

In the car insurance case study, text mining was used to identify auto features that caused injuries. T/F

F

In the classification of location-based analytic applications, examining geographic site locations falls in the consumer-oriented category. T/F

F

In the evolution of social media user engagement, the largest recent change is the growth of creators. T/F

F

Result variables are considered independent variables. T/F

F

SaaS combines aspects of cloud computing with Big Data analytics and empowers data scientists and analysts by allowing them to access centrally managed information data sets. T/F

F

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. T/F

F

Search engines are only used in the context of the World Wide Web (WWW). T/F

F

Server virtualization is the pooling of physical storage from multiple network storage devices into a single storage device. T/F

F

Siemens utilizes data sensors to track failure rates in household appliances.T/F

F

Simulations are an experimental, expensive, error-prone method for gaining insight into complex decision-making situations. T/F

F

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. T/F

F

Spreadsheets include all possible tools needed to deploy a custom DSS. T/F

F

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. T/F

F

Users definitely own their biometric data. T/F

F

Web-based e-mail such as Google's Gmail are not examples of cloud computing. T/F

F

Web-based media has nearly identical cost and scale structures as traditional media. T/F

F

While cloud services are useful for small and midsize analytic applications, they are still limited in their ability to handle Big Data applications. T/F

F

1) In the Cabela's case study, the SAS/Teradata solution enabled the direct marketer to better identify likely customers and market to them based mostly on external data sources.

FALSE

1) In the opening case, police detectives used data mining to identify possible new areas of inquiry.

FALSE

1) Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.

FALSE

11) Statistics and data mining both look for data sets that are as large as possible.

FALSE

13) In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

FALSE

13) Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.

FALSE

15) K-fold cross-validation is also called sliding estimation.

FALSE

15) Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.

FALSE

15) When training a data mining model, the testing dataset is always larger than the training dataset.

FALSE

16) Decentralization, the need for specialized skills, and immediacy of output are all attributes of Web publishing when compared to industrial publishing.

FALSE

17) Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing.

FALSE

17) In the Dell cases study, the largest issue was how to properly spend the online marketing budget.

FALSE

18) Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

FALSE

19) Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments.

FALSE

19) Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.

FALSE

20) Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

FALSE

20) Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data securitY

FALSE

20) Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

FALSE

3) Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

FALSE

4) In the patent analysis case study, text mining of thousands of patents held by the firm and its competitors helped improve competitive intelligence, but was of little use in identifying complementary products.

FALSE

5) The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.

FALSE

6) Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

FALSE

7) Ratio data is a type of categorical data.

FALSE

9) In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.

FALSE

9) In the Memphis Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

FALSE

9) In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

FALSE

________ is/are used to capture, store, analyze, and manage data linked to a location using integrated sensor technologies, global positioning systems installed in smartphones, or through RFID deployments in the retail and healthcare industries.

GIS

A critical emerging trend in analytics is the incorporation of location data. ________ data is the static location data used by these location-based analytic applications.

Geospatial

________ is performed by indicating a target cell, its desired value, and a changing cell.

Goal seeking

In this model, infrastructure resources like networks, storage, servers, and other computing resources are provided to client companies. SaaS PaaS IaaS DaaS

IaaS

________ provides resources like networks, storage, servers, and other computing resources to client companies.

IaaS

How would you describe information extraction in text mining?

Identifying key phrases and relationships in a text by looking for predefined objects and sequences using pattern matching

In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing?

Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links

In lessons learned from the Target case, what legal warnings would you give another retailer using data mining for marketing?

If you look at this practice from a legal perspective, you would conclude that Target did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy. There are certain events or concepts that should be off limits or treated extremely cautiously, such as terminal disease, divorce, and bankruptcy.

How are linear programming models vulnerable when used in complex situation?

In complex business environments, there is usually more than one simple goal like profit maximization.

What is Internet of Things (IoT) and how is it used?

Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet and to sensors that collect data on the operation, location, and state of a device. This data is processed using various analytics techniques for monitoring the device remotely from a central office or for predicting any upcoming faults in the device.

What does advanced analytics for social media do? It helps identify your followers. It identifies links between groups. It examines the content of online conversations. It identifies the biggest sources of influence online.

It examines the content of online conversations.

Why is there a trend to developing and using cloud-based tools for modeling?

It is a simpler way to apply models to real-world problems.

Why is the Monte Carlo simulation popular for solving business problems?

It is easy to use and it can be purchased from many vendors or built in Excel by hand.

How do the traditional location-based analytic techniques using geocoding of organizational locations and consumers hamper the organizations in understanding "true location-based" impacts?

Locations based on postal codes can miss the rapidly changing (growing) customer bases due to poor granularity.

________, like data, must be managed to maintain their integrity, and thus their applicability.

Models

The most common simulation method for business decision problems is the ________ simulation.

Monte Carlo

________ is the splitting of available bandwidth into channels.

Network virtualization

55) ________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

Off-site

________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

Off-site

List and describe the most common approaches for treating uncertainty.

Optimistic approach - assume the best possible outcome of each alternative and select from those Pessimistic approach - assume the worst possible outcome of each alternative and select form those Neutral approach - assume all outcomes are equally likely and make a selection based on that

Using this model, companies can deploy their software and applications in the cloud so that their customers can use them. SaaS PaaS IaaS DaaS

PaaS

Which of the following allows companies to deploy their software and applications in the cloud so that their customers can use them? SaaS IaaS PaaS AaaS

PaaS

Why are the users' page views and time spent on your Web site important metrics?

Page view counts help identify problems with site structure or disconnect between the marketing and the actual contents. Time on site gives an understanding of whether the visitors are reviewing the content and interested in the site.

Describe your understanding of the emerging term people analytics. Are there any privacy issues associated with the application?

People analytics combine organizational IT impact, Big Data, and sensors, like using sensor-embedded badges that employees wear to track their movement and predict behavior. Some privacy issues arise with the emergence of people analytics, like whether companies should be so intrusive and whether information on any one employee should be accessible unaggregated.

42) ________, also called homonyms, are syntactically identical words with different meanings.

Polysemes

________, also called homonyms, are syntactically identical words with different meanings.

Polysemes

59) ________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

Propinquity

________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

Propinquity

________ is a generic technology that refers to the use of radio-frequency waves to identify objects.

RFID

List and briefly discuss the major components of a quantitative model.

Result - a level of effectiveness of a system Decision - describe alternative courses of action Uncontrollable - factors that affect the result but are not under control of the decision maker Intermediate - reflect intermediate outcomes in mathematical models

What is search engine optimization (SEO) and why is it important for organizations that own Web sites?

SEO refers to affecting the visibility of a site in a search engine's natural search results. It is important because the higher ranking web pages get the most visits hence increasing traffic to the website. It helps the organization ensure they appear in front of exactly the right users at the right moment and be able to satisfy their search. It would be very unusual for a user to go to a second or third page of a search engine's results.

What new geometric data type in Teradata's data warehouse captures geospatial features? NAVTEQ ST_GEOMETRY GIS SQL/MM

ST_GEOMETRY

This model allows consumers to use applications and software that run on distant computers in the cloud infrastructure. SaaS PaaS IaaS DaaS

SaaS

________ analysis attempts to assess the impact of a change in the input data or parameters on the proposed solution.

Sensitivity

Identify, with a brief description, each of the four steps in the sentiment analysis process.

Sentiment Detection - classification text as objective or subjective N-P Polarity Classification - classifying text as overall positive or negative Target Identification - formulating the main target of the text Collection and Aggregation - summing up polarities and strengths of the text or more complex aggregations

44) ________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

Sentiment analysis

________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

Sentiment analysis

________ is the masking of physical servers from server users.

Server virtualization

How does Siemens use sensor data to help monitor equipment on trains?

Siemens uses an IoT model and sensors attached to several key components of trains and other railway equipment to help evaluate its current working condition, and predict the need for future repair.

Which of the following is NOT a disadvantage of a simulation? An optimal solution cannot be guaranteed, but relatively good ones are generally found. Simulation software sometimes requires special skills because of the complexity of the formal solution method. Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems. Simulation model construction can be a slow and costly process, although newer modeling systems are easier to use than ever.

Simulation is often the only DSS modeling method that can readily handle relatively unstructured problems.

A decision made under risk is also known as a probabilistic or stochastic decision-making situation. T/F

T

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. T/F

T

Categorization and clustering of documents during text mining differ only in the preselection of categories. T/F

T

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. T/F

T

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. T/F

T

Data as a service began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network. T/F

T

Decision situations that involve a finite and usually not too large number of alternatives are modeled through an approach called decision analysis. T/F

T

Every LP model has some internal intermediate variables that are not explicitly stated. T/F

T

From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of the data can be used to uncover trends, meaning, and relationships to eventually produce human-understandable representations. T/F

T

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. T/F

T

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. T/F

T

In the Quiznos case, the company employed location-based behavioral targeting to narrow the characteristics of users who were most likely to eat at a quick-service restaurant. T/F

T

In the School District of Philadelphia case, Excel and an add-in was used to evaluate different vendor options. T/F

T

In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers. T/F

T

In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. T/F

T

Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet. T/F

T

Many quantitative models of decision theory are based on comparing a single measure of effectiveness, generally some form of utility to the decision maker. T/F

T

Modeling is a key element for prescriptive analytics. T/F

T

One reason the IoT is growing exponentially is because hardware is smaller and more affordable. T/F

T

Online commerce and communication has created an immense need for forecasting and an abundance of available information for performing it. T/F

T

RFID can be used in supply chains to manage product quality. Correct! T/F

T

Regional accents present challenges for natural language processing. T/F

T

Service-oriented DSS solutions generally offer individual or bundled services to the user as a service. T/F

T

Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques. T/F

T

Simulation is the appearance of reality. T/F

T

Social networking Web sites like Facebook, Twitter, and LinkedIn, are also examples of cloud computing. T/F

T

Spreadsheets are clearly the most popular developer modeling tool. T/F

T

The pessimistic approach assumes that the worst possible outcome for each alternative will occur and selects the best of these. T/F

T

The term cloud computing originates from a reference to the Internet as a "cloud" and represents an evolution of all of the previously shared/centralized computing trends. T/F

T

VIS uses animated computer graphic displays to present the impact of different managerial decisions. T/F

T

10) Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.

TRUE

10) In data mining, classification models help in prediction.

TRUE

11) In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.

TRUE

12) Generally, making a search engine more efficient makes it less effective.

TRUE

12) Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

TRUE

14) Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.

TRUE

14) During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

TRUE

16) When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

TRUE

17) In the 2degrees case study, the main effectiveness of the new analytics system was in dissuading potential churners from leaving the company.

TRUE

18) Web site visitors who critique and create content are more engaged than those who join networks and spectate.

TRUE

19) The number of users of free/open source data mining software now exceeds that of users of commercial software versions.

TRUE

2) Categorization and clustering of documents during text mining differ only in the preselection of categories.

TRUE

2) The cost of data storage has plummeted recently, making data mining feasible for more firms.

TRUE

2) The cost of data storage has plummeted recently, making data mining feasible for more firms. Answer:

TRUE

3) Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

TRUE

4) If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

TRUE

5) Regional accents present challenges for natural language processing.

TRUE

6) In the Hong Kong government case study, reporting time was the main benefit of using SAS Business Analytics to generate reports.

TRUE

7) In the financial services firm case study, text analysis for associate-customer interactions were completely automated and could detect whether they met the company's standards.

TRUE

8) Converting continuous valued numerical variables to ranges and categories is referred to as discretization.

TRUE

8) In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

TRUE

8) Interval data is a type of numerical data.

TRUE

In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?

The Web is too big for effective data mining. The Web is too complex. The Web is too dynamic. The Web is not specific to a domain. The Web has everything.

In sentiment analysis, which of the following is an implicit opinion? The hotel we stayed in was terrible. The customer service I got for my TV was laughable. The cruise we went on last summer was a disaster. Our new mayor is great for the city.

The customer service I got for my TV was laughable.

Which of the following is NOT a characteristic displayed by a LP allocation problem? A limited quantity of economic resources is available for allocation. The resources are used in the production of products or services. There are two or more ways in which the resources can be used. The problem is not bound by constraints.

The problem is not bound by constraints.

In the data mining in Hollywood case study, how successful were the models in predicting the success or failure of a Hollywood movie?

The researchers claim that these prediction results are better than any reported in the published literature for this problem domain. Fusion classification methods attained up to 56.07% accuracy in correctly classifying movies and 90.75% accuracy in classifying movies within one category of their actual category. The SVM classification method attained up to 55.49% accuracy in correctly classifying movies and 85.55% accuracy in classifying movies within one category of their actual category.

Why is separating the impact of analytics from that of other computerized systems a difficult task? Businesses do not typically track the sources of successful projects. The trend is toward integrating systems. Software tools are not sophisticated enough. It is not an organizational priority.

The trend is toward integrating systems.

Which of the following is true of data-as-a-Service (DaaS) platforms? Knowing where the data resides is critical to the functioning of the platform. There are standardized processes for accessing data wherever it is located. Business processes can access local data only. Data quality happens on each individual platform.

There are standardized processes for accessing data wherever it is located.

Which of the following is true about the furtherance of homeland security? There is a lessening of privacy issues. There is a greater need for oversight. The impetus was the need to harvest information related to financial fraud after 2001. Most people regard analytic tools as mostly ineffective in increasing security.

There is a greater need for oversight.

Which of the following is NOT a characteristic displayed by a LP allocation problem? Each activity in which the resources are used yields a return in terms of the stated goal. The resources are used in the production of products or services. There is a single way in which the resources can be used. The allocation is usually restricted by several limitations and requirements.

There is a single way in which the resources can be used.

Why do many believe that making decisions under uncertainty is more difficult than making decisions under risk?

There is insufficient information, therefore decision maker's attitude towards risk needs to be assessed

Why are spreadsheet applications so commonly used for decision modeling?

They are cheap and easy to learn for new users.

What do voice of the market (VOM) applications of sentiment analysis do? They examine customer sentiment at the aggregate level. They examine employee sentiment in the organization. They examine the stock market for trends. They examine the "market of ideas" in politics.

They examine customer sentiment at the aggregate level.

What is one major way in which Web-based social media differs from traditional publishing media? Most Web-based media are operated by the government and large firms. They use different languages of publication. They have different costs to own and operate. Web-based media have a narrower range of quality.

They have different costs to own and operate.

Which of the following is NOT an assumption used by a LP allocation problem? The resources are to be used in the most economical manner. The return from any allocation is independent of other allocations. Total returns cannot be compared. All data are known with certainty.

Total returns cannot be compared.

The ________ approach can be used in conjunction with artificial intelligence.

VIM

Which of the following statements about Web site conversion statistics is FALSE? Web site visitors can be classed as either new or returning. Visitors who begin a purchase on most Web sites must complete it. The conversion rate is the number of people who take action divided by the number of visitors. Analyzing exit rates can tell you why visitors left your Web site.

Visitors who begin a purchase on most Web sites must complete it.

48) ________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.

Voice of the customer (VOC)

________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.

Voice of the customer (VOC)

Search engine optimization (SEO) is a means by which Web site developers can negotiate better deals for paid ads. Web site developers can increase Web site search rankings. Web site developers index their Web sites for search engines. Web site developers optimize the artistic features of their Web sites.

Web site developers can increase Web site search rankings.

Web site usability may be rated poor if the average number of page views on your Web site is large. the time spent on your Web site is long. Web site visitors download few of your offered PDFs and videos. users fail to click on all pages equally.

Web site visitors download few of your offered PDFs and videos.

________ analysis is structured as "What will happen to the solution if an input variable, an assumption, or a parameter value is changed?"

What-if

What is the difference between white hat and black hat SEO activities?

White hat SEO activities are those that search engine creators recommend, it is about ensuring that the most relevant content is shown to the most relevant user. Blackhat SEO activities are those frowned upon by the search engine creators. A website can be penalized for engaging in such activities.

In text analysis, what is a lexicon? a catalog of words, their synonyms, and their meanings a catalog of customers, their words, and phrases a catalog of letters, words, phrases, and sentences a catalog of customers, products, words, and phrases

a catalog of words, their synonyms, and their meanings

In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT massive parallelism to enable simultaneous consideration of multiple hypotheses. an underlying confidence subsystem that ranks and integrates answers. a core engine that could operate seamlessly in another domain without changes. integration of shallow and deep knowledge.

a core engine that could operate seamlessly in another domain without changes.

With RFID tags, a(n) ________ tag has a battery on board to energize it.

active

Spreadsheets use ________ to extend their functionality.

add-ins

The components of a quantitative model are linked by ________ expressions.

algebraic

Natural language processing (NLP) is associated with which of the following areas? text mining artificial intelligence computational linguistics all of these

all of these

Risk ________ is a decision-making method that analyzes the risk (based on assumed known probabilities) associated with different alternatives.

analysis

What does Web content mining involve? analyzing the universal resource locator in Web pages analyzing the unstructured content of Web pages analyzing the pattern of visits to a Web site analyzing the PageRank and other metadata of a Web page

analyzing the unstructured content of Web pages

The portion of the IoT technology infrastructure that focuses on controlling what and how information is captured is hardware. connectivity. software backend. applications.

applications.

Pokémon GO is an example of a location-sensing ________ reality-based game.

augmented

In what ways can communications companies use geospatial analysis to harness their data effectively?

better identify the customer churn and help in formulating strategies specific to locations for increasing operational efficiency, quality of service, and revenue.

________ represent the labels of multiple classes used to divide a variable into specific groups, examples of which include race, sex, age group, and educational level.

categorical data

In text mining, tokenizing is the process of categorizing a block of text in a sentence. reducing multiple words to their base or root. transforming the term-by-document matrix to a manageable size. creating new branches or stems of recorded paragraphs.

categorizing a block of text in a sentence.

When the decision maker knows exactly what the outcome of each course of action will be, this is decision making under certainty. uncertainty. risk. duress.

certainty

54) In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors.

channels

In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors.

channels

58) In the Social Network Analysis (SNA) for Telecommunications case, SNA can be used to detect ________, i.e., those visitors who about to leave the website and persuade them to stay with you.

churners

Which of the following is NOT a component of a quantitative model? result variables decision variables classes parameters

classes

IaaS, AaaS and other ________-based offerings allow the rapid diffusion of advanced analysis tools among users, without significant investment in technology acquisition.

cloud

What is cloud computing? What is Amazon's general approach to the cloud computing services it provides?

cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly and easily provisioned and released Amazon.com has developed an impressive technology infrastructure that includes data centers. Other companies can use Amazon.com's cloud services on a pay-per-use-basis without having to make similar investments.

A more general form of an influence diagram is called a(n) forecast. environmental scan. cognitive map. static model.

cognitive map.

If a simulation result does NOT match the intuition or judgment of the decision maker, what can occur? read/write error visual distortion project failure confidence gap

confidence gap

Multiple goals is a decision situation in which alternatives are evaluated with several, sometimes ________, goals.

conflicting

The portion of the IoT technology infrastructure that focuses on how to transmit data is hardware. connectivity. software backend. applications

connectivity.

In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience.

consistent

GPS Navigation is an example of which kind of location-based analytics? organization-oriented geospatial static approach organization-oriented location-based dynamic approach consumer-oriented geospatial static approach consumer-oriented location-based dynamic approach

consumer-oriented geospatial static approach

51) Web ________ are used to automatically read through the contents of Web sites.

crawlers/spiders

Web ________ are used to automatically read through the contents of Web sites.

crawlers/spiders

57) As described in the 2degrees case study, a common problem in the mobile telecommunications industry is defined by the term ________, which means customers leaving.

customer churn

As described in the 2degrees case study, a common problem in the mobile telecommunications industry is defined by the term ________, which means customers leaving.

customer churn

43) Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

data mining

Knowledge extraction, pattern analysis, data archaeology, information harvesting, pattern searching, and data dredging are all alternative names for ________.

data mining

52) Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.

data preprocessing

Data preparation, the third step in the CRISP-DM data mining process, is more commonly known as ________.

data preprocessing

44) Data are often buried deep within very large ________, which sometimes contain data from several years.

databases

Data are often buried deep within very large ________, which sometimes contain data from several years.

databases

60) One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

de-identification

One way to accomplish privacy and protection of individuals' rights when data mining is by ________ of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual.

de-identification

Every LP model is composed of ________ variables whose values are unknown and are searched for.

decision

56) The basic idea behind a ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.

decision tree

The basic idea behind a ________ is that it recursively divides a training set until each division consists entirely or primarily of examples from one class.

decision tree

Analytics can change the way in which many ________ are made by managers and can consequently change their jobs.

decisions

51) In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing.

deviation

In the terrorist funding case study, an observed price ________ may be related to income tax avoidance/evasion, money laundering, or terrorist financing.

deviation

47) Because the term-document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.

dimensionality

Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.

dimensionality

All of the following are challenges associated with natural language processing EXCEPT dividing up a text into individual words in English. understanding the context in which something is said. distinguishing between words that have more than one meaning. recognizing typographical or grammatical errors in texts.

dividing up a text into individual words in English.

A(n) ________ model can be constructed under assumed environments of certainty.

dynamic

A(n) ________ spreadsheet model represents behavior over time. static dynamic looped add-in

dynamic

Which of these is NOT a part of the IoT technology infrastructure? hardware connectivity electrical access software

electrical access

47) Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.

extracted

Patterns have been manually ________ from data by humans for centuries, but the increasing volume of data in modern times has created a need for more automatic approaches.

extracted

Smartbin has developed trash containers that include sensors to detect fill levels. types of trash. tip-over. weather.

fill levels.

48) While prediction is largely experience and opinion based, ________ is data and model based.

forecasting

While prediction is largely experience and opinion based, ________ is data and model based.

forecasting

This method calculates the values of the inputs necessary to achieve a desired level of an output. goal seek what-if sensitivity LP

goal seek

This method calculates the values of the inputs necessary to generate a zero profit outcome. goal seek what-if sensitivity break-even

goal seek

The most common method for solving a risk analysis problem is to select the alternative with the smallest expected value. greatest expected value. mean expected value. median expected value.

greatest expected value.

Today, most smartphones are equipped with various instruments to measure jerk, orientation, and sense motion. One of these instruments is an accelerometer, and the other is a(n) potentiometer. gyroscope. microscope. oscilloscope.

gyroscope.

The portion of the IoT technology infrastructure that focuses on the sensors themselves is hardware. connectivity. software backend. applications.

hardware.

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand the hardware your Web site is running on. the type of Web browser being used by your Web site visitors. most of your Web site visitors' wants and needs. how well visitors understand your products.

how well visitors understand your products.

52) A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

hub

A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

hub

50) Web pages contain both unstructured information and ________, which are connections to other Web pages.

hyperlinks

Web pages contain both unstructured information and ________, which are connections to other Web pages.

hyperlinks

A(n) ________ is a graphical representation of a model. multidimensional analysis influence diagram OLAP model Whisker plot

influence diagram

55) In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.

k-fold cross-validation

In ________, a classification method, the complete data set is randomly split into mutually exclusive subsets of approximately equal size and tested multiple times on each left-out subset, using the others as a training set.

k-fold cross-validation

54) Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.

knowledge discovery

Fayyad et al. (1996) defined ________ in databases as a process of using data mining methods to find useful information and patterns in the data.

knowledge discovery

By using ________, businesses can collect and analyze data to discern large-scale patterns of movement and identify distinct classes of behaviors in specific contexts.

location-enabled services

A decision tree can be cumbersome if there are uncertain results. few alternatives. many alternatives. pre-existing decision tables.

many alternatives.

58) Because of its successful application to retail business problems, association rule mining is commonly called ________.

market-basket analysis

Because of its successful application to retail business problems, association rule mining is commonly called ________.

market-basket analysis

Intermediate result variables reflect intermediate outcomes in mathematical models. flowcharts. decision trees. ROI calculations.

mathematical models.

45) In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.

message feature mining

In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.

message feature mining

Location information from ________ phones can be used to create profiles of user behavior and movement.

mobile

42) There has been an increase in data mining to deal with global competition and customers' more sophisticated ________ and wants.

needs

What are the two main types of Web analytics? old-school and new-school Web analytics Bing and Google Web analytics off-site and on-site Web analytics data-based and subjective Web analytics

off-site and on-site Web analytics

Of the available solutions, at least one is the best, in the sense that the degree of goal attainment associated with it is the highest; this is called a(n) ________ solution.

optimal

The ________ approach assumes that the best possible outcome of each alternative will occur and then selects the best of the best.

optimistic

What kind of location-based analytics is a real-time marketing promotion? organization-oriented geospatial static approach organization-oriented location-based dynamic approach consumer-oriented geospatial static approach consumer-oriented location-based dynamic approach

organization-oriented location-based dynamic approach

Factors that are not under the control of the decision maker but can be fixed, are called ________.

parameters

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called preprocessing the documents. document analysis. creating the term-by-document matrix. parsing the documents.

parsing the documents.

With RFID tags, a(n) ________ tag receives energy from the electromagnetic field created by the interrogator.

passive

For individual decision makers, ________ values constitute a major factor in the issue of ethical decision making.

personal

Important spreadsheet features for modeling include all of the following EXCEPT what-if analysis. goal seeking. macros. pivot tables.

pivot tables.

49) When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion.

polarity

When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion

polarity

41) In the opening vignette, Cabela's uses SAS data mining tools to create ________ models to optimize customer selection for all customer contacts.

predictive

In the opening vignette, Cabela's uses SAS data mining tools to create ________ models to optimize customer selection for all customer contacts.

predictive

46) In the Memphis Police Department case study, shortly after all precincts embraced Blue CRUSH, ________ became one of the most potent weapons in the Memphis police department's crime-fighting arsenal.

predictive analytics

In the Memphis Police Department case study, shortly after all precincts embraced Blue CRUSH, ________ became one of the most potent weapons in the Memphis police department's crime-fighting arsenal.

predictive analytics

In general, ________ is the right to be left alone and the right to be free from unreasonable personal intrusion.

privacy

Predictive analytics is beginning to enable development of software that is directly used by a consumer. One key concern in employing these technologies is the loss of ________.

privacy

A(n) ________ is operated solely for a single organization having a mission critical workload and security concerns.

private cloud

In ________ simulation, one or more of the independent variables (e.g., the demand in an inventory problem) are subject to chance variation

probabilistic

In a(n) ________ the subscriber uses the resources offered by service providers over the Internet.

public cloud

56) A ________ Web site contains links that send traffic directly to your Web site.

referral

A(n) ________ Web site contains links that send traffic directly to your Web site.

referral

50) Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.

relationship

Customer ________ management extends traditional marketing by creating one-on-one relationships with customers.

relationship

53) The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.

relationships

The data mining in cancer research case study explains that data mining methods are capable of extracting patterns and ________ hidden deep in large and complex medical databases.

relationships

A probabilistic decision-making situation is a decision made under ________.

risk

When the decision maker must consider several possible outcomes for each alternative, each with a given probability of occurrence, this is decision making under certainty. uncertainty. risk. duress.

risk.

53) A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

search

A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

search

In the Wimbledon case study, the tournament used data for each match in real time to highlight winners and losers. player histories. significant events. advertiser content.

significant events.

Conventional ________ generally reports statistical results at the end of a set of experiments.

simulation

Services that let consumers permanently enter a profile of information along with a password and use this information repeatedly to access services at multiple sites are called consumer access applications. information collection portals. single-sign-on facilities. consumer information sign on facilities.

single-sign-on facilities.

What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? medium- to large-sized documents small- to medium-sized documents large-sized documents collections of documents

small- to medium-sized documents

The portion of the IoT technology infrastructure that focuses on how to manage incoming data and analyze it is hardware. connectivity. software backend. applications.

software backend.

What type of VIM models display a visual image of the result of one decision alternative at a time? static dynamic DSS VIS

static

49) Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement.

statistics

Whereas ________ starts with a well-defined proposition and hypothesis, data mining starts with a loosely defined discovery statement.

statistics

Describe the query-specific clustering method as it relates to clustering.

the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less-similar documents, creating a spectrum of relevance levels among the documents

25) In the research literature case study, the researchers analyzing academic papers extracted information from which source?

the paper abstract

In the research literature case study, the researchers analyzing academic papers extracted information from which source? the paper abstract the paper keywords the main body of the paper the paper references

the paper abstract

A major structural change that can occur when analytics are introduced into an organization is the creation of new organizational ________.

units

Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to use only the single, approved English lexicon. use any general English lexicon. use an English lexicon appropriate to the project at your discretion. create an English lexicon for the project.

use an English lexicon appropriate to the project at your discretion.

Identification of a model's variables (e.g., decision, result, uncontrollable) is critical, as are the relationships among the ________.

variables

Selecting the best ________ to work with is a laborious yet important task for companies and government organizations.

vendors

AaaS in the cloud has economies of scale and scope by providing many ________ analytical applications with better scalability and higher cost savings. Correct!

virtual

43) When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

word sense disambiguation

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

word sense disambiguation

myths associated with data mining

∙ Data mining provides instant, crystal-ball-like predictions. ∙ Data mining is not yet viable for business applications. ∙ Data mining requires a separate, dedicated database. ∙ Only those with advanced degrees can do data mining. ∙ Data mining is only for large firms that have lots of customer data.

common data mining mistakes

∙ Selecting the wrong problem for data mining ∙ Ignoring what your sponsor thinks data mining is and what it really can and cannot do ∙ Leaving insufficient time for data preparation ∙ Looking only at aggregated results and not at individual records ∙ Being sloppy about keeping track of the data mining procedure and results ∙ Ignoring suspicious findings and quickly moving on ∙ Running mining algorithms repeatedly and blindly ∙ Believing everything you are told about the data ∙ Believing everything you are told about your own data mining analysis ∙ Measuring your results differently from the way your sponsor measures them

six steps of the CRISP-DM data mining process

∙ Step 1: Business Understanding ∙ Step 2: Data Understanding ∙ Step 3: Data Preparation ∙ Step 4: Model Building ∙ Step 5: Testing and Evaluation ∙ Step 6: Deployment


Kaugnay na mga set ng pag-aaral

Catcher in the Rye Questions and Quizzes

View Set

Financial Statements and Ratio Analysis

View Set

E-Commerce - Chapter 1, E-commerce Chapter 2 MCQ, E-Commerce Chapter 3, E-Commerce Chapter 4, E-commerce chapter 5, Ecommerce chapter 5 quiz 1, E-commerce chapter 6, E-commerce Chapter 7, E-commerce chapter 8, Ecommerce Quiz 2- Chapter 9, Chapter 9,...

View Set

Ch8 Comprehensive Medical Eye Examination

View Set

NUR 2144 Pharmacology II Chapter 55: Drugs Acting on the Lower Respiratory Tract

View Set

Advance Accounting Study Guide Chapter 17

View Set

Nursing Process (PREPU Questions) CHP. 17 - IMPLEMENTING

View Set