ISM3540 Exam 2
Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?
Classification
Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?
Clustering
Which of the following is a data mining myth?
Data mining requires a separate, dedicated database.
In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data. (T/F)
False
In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. (T/F)
False
In the Dell cases study, the largest issue was how to properly spend the online marketing budget. (T/F)
False
In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime. (T/F)
False
In the Salesforce case study, streaming data is used to identify services that customers use most. (T/F)
False
In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals. (T/F)
False
In the car insurance case study, text mining was used to identify auto features that caused injuries. (T/F)
False
In the evolution of social media user engagement, the largest recent change is the growth of creators. (T/F)
False
In the opening case, police detectives used data mining to identify possible new areas of inquiry. (T/F)
False
K-fold cross-validation is also called sliding estimation. (T/F)
False
Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance. (T/F)
False
Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica. (T/F)
False
Ratio data is a type of categorical data. (T/F)
False
Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. (T/F)
False
Search engines are only used in the context of the World Wide Web (WWW). (T/F)
False
Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. (T/F)
False
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. (T/F)
False
The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit. (T/F)
False
Understanding which keywords your users enter to reach your Web site through a search engine can help you understand
How well visitors understand your products
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?
In-memory analytics
In the Influence Health case study, what was the goal of the system?
Increasing service use
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?
Insurance
The data field "ethnic group" can be best described as
Nominal data
What are the two main types of Web analytics?
Off-site and on-site Web analytics
In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?
Secondary node
In the Wimbledon case study, the tournament used data for each match in real time to highlight
Significant events
Clustering partitions a collection of things into segments whose members share
Similar characteristics
In sentiment analysis, which of the following is an implicit opinion?
The customer service I got for my TV was laughable.
All of the following statements about data mining are true EXCEPT:
The ideas behind it are relatively new.
In the research literature case study, the researchers analyzing academic papers extracted information from which source?
The paper abstract
Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. (T/F)
True
Big Data is being driven by the exponential growth, availability, and use of information. (T/F)
True
Categorization and clustering of documents during text mining differ only in the preselection of categories. (T/F)
True
Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. (T/F)
True
Converting continuous valued numerical variables to ranges and categories is referred to as discretization. (T/F)
True
Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. (T/F)
True
During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. (T/F)
True
For low latency, interactive reports, a data warehouse is preferable to Hadoop. (T/F)
True
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel. (T/F)
True
If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse. (T/F)
True
In data mining, classification models help in prediction. (T/F)
True
In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. (T/F)
True
In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. (T/F)
True
In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. (T/F)
True
In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service. (T/F)
True
It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics. (T/F)
True
Regional accents present challenges for natural language processing. (T/F)
True
The cost of data storage has plummeted recently, making data mining feasible for more firms. (T/F)
True
Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. (T/F)
True
When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach. (T/F)
True
Which of the following statements about Web site conversion statistics is FALSE?
Visitors who begin a purchase on most Web sites must complete it.
Search engine optimization (SEO) is a means by which
Web site developers can increase Web site search rankings.
In text analysis, what is a lexicon?
a catalog of words, their synonyms, and their meanings
In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT
a core engine that could operate seamlessly in another domain without changes.
What does the robustness of a data mining method refer to?
its ability to overcome noisy data to make somewhat accurate predictions
All of the following statements about data mining are true EXCEPT
the process aspect means that data mining should be a one-step process to results.
The Eckerson survey of 2002 estimated the total cost (to the US yearly economy) of dirty data to be approximately:
$600 Billion USD
A company/organization can encounter dirty data in the form of
All of these
Natural language processing (NLP) is associated with which of the following areas?
All of these
In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as
Association rule mining
You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data?
Build a website that validates data as the survey participant takes the survey.
Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. (T/F)
False
In a Hadoop "stack," what is a slave node?
a node where data is stored and processed
What does Web content mining involve?
analyzing the unstructured content of Web pages
Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from
analyzing the vast data amounts routinely collected.
In text mining, tokenizing is the process of
categorizing a block of text in a sentence.