ISM 3540 Exam 2 Study Guide
T or F: In the car insurance case study, text mining was used to identify auto features that caused injuries.
false
T or F: Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.
true
T or F: Categorization and clustering of documents during text mining differ only in the preselection of categories.
true
T or F: In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.
true
T or F: In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document or sample.
true
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal? a. determine if diseases are accurately diagnosed b. determine probabilities of diseases that are comorbid c. determine differences in rates of diseases in urban & rural populations d. determine differences in rates of disease in males v. females
C
In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT A. massive parallelism to enable simultaneous consideration of multiple hypotheses. B. an underlying confidence subsystem that ranks and integrates answers. C. a core engine that could operate seamlessly in another domain without changes. D. integration of shallow and deep knowledge.
C
Web site usability may be rated poor if: a. the average number of page views on your Web site is large. b. the time spent on your Web site is long. c. Web site visitors download few of your offered PDFs and videos. d. users fail to click on all pages equally.
C
Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources? a. in-memory analytics b. in-database analytics c. grid computing d. appliances
C
Which of these is NOT a part of the IoT technology infrastructure? A. Hardware B. Connectivity C. Electrical access D. Software
C. Electrical access
What does the scalability of a data mining method refer to? A. Its ability to predict the outcome of a previously unknown data set accurately B. Its speed of computation & computational costs in using the mode C. Its ability to construct a prediction model efficiently given a large amount of data D. Its ability to overcome noisy data to make somewhat accurate predictions
C. Its ability to construct a prediction model efficiently given a large amount of data
Which of the following allows companies to deploy their software and applications in the cloud so that their customers can use them? A. SaaS B. IaaS C. PaaS D. AaaS
C. PaaS
Which of the following sources is likely to produce Big Data the fastest? a. order entry clerks b. cashiers c. RFID tags d. online customers
C. RFID Tags
The portion of the IoT technology infrastructure that focuses on how to manage incoming data and analyze it is A. Hardware B. Connectivity C. Software Backend D. Applications
C. Software Backend
T or F: Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.
False
T or F: Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing.
False
T or F: Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.
False
T or F: Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.
False
T or F: Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.
False
T or F: Hadoop and MapReduce require each other to work.
False
T or F: IaaS helps provide faster information, but provides information only to managers in an organization.
False
T or F: In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data.
False
T or F: In the Dell cases study, the largest issue was how to properly spend the online marketing budget.
False
T or F: In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.
False
T or F: In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.
False
T or F: In the evolution of social media user engagement, the largest recent change is the growth of creators.
False
T or F: In the opening case, police detectives used data mining to identify possible new areas of inquiry.
False
T or F: K-fold cross-validation is also called sliding estimation.
False
T or F: Ratio data is a type of categorical data.
False
T or F: Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.
False
T or F: Search engines are only used in the context of the World Wide Web (WWW).
False
T or F: Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.
False
T or F: Web-based e-mail such as Google's Gmail are not examples of cloud computing.
False
T or F: While cloud services are useful for small and midsize analytic applications, they are still limited in their ability to handle Big Data applications.
False
T or F: Converting continuous valued numerical variables to ranges and categories is referred to as discretization.
True
T or F: For low latency, interactive reports, a data warehouse is preferable to Hadoop.
True
T or F: From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of the data can be used to uncover trends, meaning, and relationships to eventually produce human-understandable representations.
True
T or F: In data mining, classification models help in prediction.
True
T or F: In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.
True
T or F: Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet.
True
T or F: One reason the IoT is growing exponentially is because hardware is smaller and more affordable.
True
T or F: Service-oriented DSS solutions generally offer individual or bundled services to the user as a service.
True
T or F: Social media mentions can be used to chart and predict flu outbreaks.
True
The Survey of 2017 estimated the total cost (to the US yearly economy) of dirty data to be approximately: A. $3.1 Trillion USD B. $600 Million USD C. $3.1 Billion USD D. $600 Trillion USD
A. $3.1 Trillion USD
A company/organization can encounter dirty data in the form of A. All of these B. Invalid mailing address C. Invalid email address D. Duplicated data
A. All of these
In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A. Association rule mining B. Cluster analysis C. Decision trees D. Artificial neural networks
A. Association rule mining
All of the following statements about data mining are true EXCEPT A. The process aspect means that data mining should be a one-step process to results B. The novel aspect means that previously unknown patterns are discovered C. The potentially useful aspect means that the results should lead to some business benefit D. The valid aspect means that the discovered patterns should hold true on new data.
A
Which of the following is NOT one of the "3 V's of Big Data" a. veracity b. volume c. variety d. velocity
A
You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data? A. build a website that validates data as the survey participant takes the survey B. let your friend throw a survey site together that accumulates the data & you export it into a spreadsheet & fix the data manually. C. Have the survey site email you when it encounters data that is not formatted correctly. D. Have the survey accumulate the data & then email the survey participant after the survey is processed asking them to retake it due to invalid data.
A. Build a website that validates data as the survey participant takes the survey.
Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A. Insurance B. Retailing & logistics C. Customer relationship management D. Computer hardware & software
A. Insurance
In text mining, tokenizing is the process of A. categorizing a block of text in a sentence. B. reducing multiple words to their base or root. C. transforming the term-by-document matrix to a manageable size. D. creating new branches or stems of recorded paragraphs.
A. categorizing a block of text in a sentence.
In the research literature case study, the researchers analyzing academic papers extracted information from which source? A. paper abstract B. paper keywords C. main body of the paper D. paper references
A. paper abstract
Search engine optimization (SEO) is a means by which: a. Web site developers can negotiate better deals for paid ads. b. Web site developers can increase Web site search rankings. c. Web site developers index their Web sites for search engines. d. Web site developers optimize the artistic features of their Web sites.
B
What does Web content mining involve? a. analyzing the universal resource locator in Web pages b. analyzing the unstructured content of Web pages c. analyzing the pattern of visits to a Web site d. analyzing the PageRank and other metadata of a Web page
B
What is Big Data's relationship to the cloud? a. Hadoop cannot be deployed effectively in the cloud just yet b. Amazon & Google have working Hadoop cloud offerings c. IBM's homegrown Hadoop platform is the only option d. only MapReduce works in the cloud; Hadoop does not
B
Which of the following is a data mining myth? A. Data mining is a multistep process that requires deliberate, proactive design and use. B. Data mining requires a separate, dedicated database. C. The current state-of-the-art is ready to go for almost any business. D. Newer Web-based tools enable managers of all educational levels to do data mining.
B. Data mining requires a separate, dedicated database.
In sentiment analysis, which of the following is an implicit opinion? A. The hotel we stayed in was terrible. B. The customer service I got for my TV was laughable. C. The cruise we went on last summer was a disaster. D. Our new mayor is great for the city
B. the customer service I got for my TV was laughable.
In the Target case study, why did Target send a teen maternity ads? A. Target's analytical model confused her with an older woman with a similar name B. Target was sending ads to all women in a particular neighborhood C. Target's analytic model suggested she was pregnant based on her buying habits D. Target was using a special promotion that targeted all teens in her geographical area.
C. Target's analytic model suggested she was pregnant based on her buying habits
All of the following statements about data mining are true EXCEPT: A. The term is relatively new B. Its techniques have their roots in traditional statistical analysis C. The ideas behind it are relatively new D. Intense, global competition make its application more important
C. The ideas behind it are relatively new
What is the main reason parallel processing is sometimes used for data mining? A. because the hardware exists in most organizations, & it is available to use B. bc most of the algorithms used for data mining require it C. bc of the massive amounts & search efforts involved D. bc any strategic application requires parallel processing
C. bc of the massive amounts & search efforts involved
How does Hadoop work? a. it integrates big data into a whole so large data elements can be processed as a whole on one computer b. integrates big data into a whole so large data elements can be processed as a whole on multiple computers c. it breaks up big data into multiple parts so each part can be processed & analyzed at the same time on one computer d. it breaks up big data into multiple parts so each part can be processed & analyzed at the same time on multiple computers.
D
Understanding which keywords your users enter to reach your Web site through a search engine can help you understand: a. the hardware your Web site is running on. b. the type of Web browser being used by your Web site visitors. c. most of your Web site visitors' wants and needs. d. how well visitors understand your products.
D
As discussed in class, which data mining process/methodology is the most widely-used and generally regarded as the most comprehensive? A. SEMMA B. Proprietary organizational methodologies C. KDD Process D. CRISP-DM
D. CRISP-DM
Natural language processing (NLP) is associated with which of the following areas? A. artificial intelligence B. text mining C. computational linguistics D. all of these
D. all of these