Test 2 Chap 4, ISM4402 Exam 2

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

29) Clustering partitions a collection of things into segments whose members share A) similar characteristics. B) dissimilar characteristics. C) similar collection methods. D) dissimilar collection methods.

A

27) Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? A) associations B) visualization C) classification D) clustering

C

31) All of the following statements about data mining are true EXCEPT: A) The term is relatively new. B) Its techniques have their roots in traditional statistical analysis and artificial intelligence. C) The ideas behind it are relatively new. D) Intense, global competition make its application more important.

C

________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network.

Cohesion

________ statistics help you understand whether your specific marketing objective for a Web page is being achieved.

Conversion

28) Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

D

37) In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as A) association rule mining. B) cluster analysis. C) decision trees. D) artificial neural networks.

A

33) Prediction problems where the variables have numeric values are most accurately defined as A) classifications. B) regressions. C) associations. D) computations.

B

24) What is the main reason parallel processing is sometimes used for data mining? A) because the hardware exists in most organizations, and it is available to use B) because most of the algorithms used for data mining require it C) because of the massive data amounts and search efforts involved D) because any strategic application requires parallel processing

C

40) Which of the following is a data mining myth? A) Data mining is a multistep process that requires deliberate, proactive design and use. B) Data mining requires a separate, dedicated database. C) The current state-of-the-art is ready to go for almost any business. D) Newer Web-based tools enable managers of all educational levels to do data mining.

B

22) Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from A) collecting data about customers and transactions. B) developing a philosophy that is data analytics-centric. C) analyzing the vast data amounts routinely collected. D) asking the customers what they want.

C

35) What does the scalability of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

C

38) Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by A) asking data users to use the data ethically. B) leaving in identifiers (e.g., name), but changing other variables. C) removing identifiers such as names and social security numbers. D) letting individuals in the data know their data is being accessed.

C

39) In the Target case study, why did Target send a teen maternity ads? A) Target's analytic model confused her with an older woman with a similar name. B) Target was sending ads to all women in a particular neighborhood. C) Target's analytic model suggested she was pregnant based on her buying habits. D) Target was using a special promotion that targeted all teens in her geographical area.

C

At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________.

Corpus

21) In the Influence Health case study, what was the goal of the system? A) locating clinic patients B) understanding follow-up care C) decreasing operational costs D) increasing service use

D

26) A data mining study is specific to addressing a well-defined business task, and different business tasks require A) general organizational data. B) general industry data. C) general economic data. D) different sets of data.

D

32) Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings? A) SEMMA B) proprietary organizational methodologies C) KDD Process D) CRISP-DM

D

34) What does the robustness of a data mining method refer to? A) its ability to predict the outcome of a previously unknown data set accurately B) its speed of computation and computational costs in using the mode C) its ability to construct a prediction model efficiently given a large amount of data D) its ability to overcome noisy data to make somewhat accurate predictions

D

23) All of the following statements about data mining are true EXCEPT A) the process aspect means that data mining should be a one-step process to results. B) the novel aspect means that previously unknown patterns are discovered. C) the potentially useful aspect means that results should lead to some business benefit. D) the valid aspect means that the discovered patterns should hold true on new data.

A

25) The data field "ethnic group" can be best described as A) nominal data. B) interval data. C) ordinal data. D) ratio data.

A

30) Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? A) insurance B) retailing and logistics C) customer relationship management D) computer hardware and software

A

36) In estimating the accuracy of data mining (or other) classification models, the true positive rate is A) the ratio of correctly classified positives divided by the total positive count. B) the ratio of correctly classified negatives divided by the total negative count. C) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified positives. D) the ratio of correctly classified positives divided by the sum of correctly classified positives and incorrectly classified negatives.

A

How would you describe information extraction in text mining?

Identifying key phrases and relationships in a text by looking for predefined objects and sequences using pattern matching

In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing?

Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links

Categorization and clustering of documents during text mining differ only in the preselection of categories. T/F

T

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. T/F

T

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. T/F

T

In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. T/F

T

Regional accents present challenges for natural language processing. T/F

T

10) In data mining, classification models help in prediction.

TRUE

12) Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

TRUE

14) During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

TRUE

16) When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

TRUE

2) The cost of data storage has plummeted recently, making data mining feasible for more firms. Answer:

TRUE

4) If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

TRUE

8) Converting continuous valued numerical variables to ranges and categories is referred to as discretization.

TRUE

In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining?

The Web is too big for effective data mining. The Web is too complex. The Web is too dynamic. The Web is not specific to a domain. The Web has everything.

What do voice of the market (VOM) applications of sentiment analysis do? They examine customer sentiment at the aggregate level. They examine employee sentiment in the organization. They examine the stock market for trends. They examine the "market of ideas" in politics.

They examine customer sentiment at the aggregate level.

What is one major way in which Web-based social media differs from traditional publishing media? Most Web-based media are operated by the government and large firms. They use different languages of publication. They have different costs to own and operate. Web-based media have a narrower range of quality.

They have different costs to own and operate.

Which of the following statements about Web site conversion statistics is FALSE? Web site visitors can be classed as either new or returning. Visitors who begin a purchase on most Web sites must complete it. The conversion rate is the number of people who take action divided by the number of visitors. Analyzing exit rates can tell you why visitors left your Web site.

Visitors who begin a purchase on most Web sites must complete it.

________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer.

Voice of the customer (VOC)

Search engine optimization (SEO) is a means by which Web site developers can negotiate better deals for paid ads. Web site developers can increase Web site search rankings. Web site developers index their Web sites for search engines. Web site developers optimize the artistic features of their Web sites.

Web site developers can increase Web site search rankings.

Web site usability may be rated poor if the average number of page views on your Web site is large. the time spent on your Web site is long. Web site visitors download few of your offered PDFs and videos. users fail to click on all pages equally.

Web site visitors download few of your offered PDFs and videos.

What is the difference between white hat and black hat SEO activities?

White hat SEO activities are those that search engine creators recommend, it is about ensuring that the most relevant content is shown to the most relevant user. Blackhat SEO activities are those frowned upon by the search engine creators. A website can be penalized for engaging in such activities.

What does Web content mining involve? analyzing the universal resource locator in Web pages analyzing the unstructured content of Web pages analyzing the pattern of visits to a Web site analyzing the PageRank and other metadata of a Web page

analyzing the unstructured content of Web pages

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. T/F

T

In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers. T/F

T

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. T/F

T

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand the hardware your Web site is running on. the type of Web browser being used by your Web site visitors. most of your Web site visitors' wants and needs. how well visitors understand your products.

how well visitors understand your products.

A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages.

hub

Web pages contain both unstructured information and ________, which are connections to other Web pages.

hyperlinks

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. T/F

F

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. T/F

F

Web-based media has nearly identical cost and scale structures as traditional media. T/F

F

1) In the opening case, police detectives used data mining to identify possible new areas of inquiry.

FALSE

11) Statistics and data mining both look for data sets that are as large as possible.

FALSE

13) In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

FALSE

15) K-fold cross-validation is also called sliding estimation.

FALSE

17) In the Dell cases study, the largest issue was how to properly spend the online marketing budget.

FALSE

18) Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

FALSE

20) Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data securitY

FALSE

In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques.

message feature mining

A(n) ________ engine is a software program that searches for Web sites or files based on keywords.

search

In the Wimbledon case study, the tournament used data for each match in real time to highlight winners and losers. player histories. significant events. advertiser content.

significant events.

What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? medium- to large-sized documents small- to medium-sized documents large-sized documents collections of documents

small- to medium-sized documents

Describe the query-specific clustering method as it relates to clustering.

the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less-similar documents, creating a spectrum of relevance levels among the documents

In the research literature case study, the researchers analyzing academic papers extracted information from which source? the paper abstract the paper keywords the main body of the paper the paper references

the paper abstract

What are the two main types of Web analytics? old-school and new-school Web analytics Bing and Google Web analytics off-site and on-site Web analytics data-based and subjective Web analytics

off-site and on-site Web analytics

In sentiment analysis, which of the following is an implicit opinion? The hotel we stayed in was terrible. The customer service I got for my TV was laughable. The cruise we went on last summer was a disaster. Our new mayor is great for the city.

The customer service I got for my TV was laughable.

7) Ratio data is a type of categorical data.

FALSE

19) Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.

FALSE

3) Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

FALSE

A(n) ________ Web site contains links that send traffic directly to your Web site.

referral

5) The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit.

FALSE

6) Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

FALSE

When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________.

word sense disambiguation

9) In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

FALSE

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. T/F

F

Search engines are only used in the context of the World Wide Web (WWW). T/F

F

IBM's Watson utilizes a massively parallel, text mining-focused, probabilistic evidence-based computational architecture called ________.

DeepQA

What are the three categories of social media analytics technologies and what do they do?

Descriptive analytics: using statistical methods to identify activity characteristics and trends, count of users, reviews, followers. Social network analysis: Identifying connections of influence through the friend and fan groups, as well as the biggest sources of influence. Advanced analytics: Using predictive and text analytics to examine the content in online conversations, with the goal of identifying hidden themes, sentiments, and connections.

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. T/F

F

Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. T/F

F

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. T/F

F

In the car insurance case study, text mining was used to identify auto features that caused injuries. T/F

F

In the evolution of social media user engagement, the largest recent change is the growth of creators. T/F

F

What does advanced analytics for social media do? It helps identify your followers. It identifies links between groups. It examines the content of online conversations. It identifies the biggest sources of influence online.

It examines the content of online conversations.

________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site.

Off-site

Why are the users' page views and time spent on your Web site important metrics?

Page view counts help identify problems with site structure or disconnect between the marketing and the actual contents. Time on site gives an understanding of whether the visitors are reviewing the content and interested in the site.

________, also called homonyms, are syntactically identical words with different meanings.

Polysemes

________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources.

Sentiment analysis

________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close.

Propinquity

Identify, with a brief description, each of the four steps in the sentiment analysis process.

Sentiment Detection - classification text as objective or subjective N-P Polarity Classification - classifying text as overall positive or negative Target Identification - formulating the main target of the text Collection and Aggregation - summing up polarities and strengths of the text or more complex aggregations

In text mining, tokenizing is the process of categorizing a block of text in a sentence. reducing multiple words to their base or root. transforming the term-by-document matrix to a manageable size. creating new branches or stems of recorded paragraphs.

categorizing a block of text in a sentence.

In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors.

channels

In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience.

consistent

Web ________ are used to automatically read through the contents of Web sites.

crawlers/spiders

Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix.

dimensionality

When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion

polarity


Ensembles d'études connexes

Entrepreneurship and Starting a Small Business

View Set

ASDV 2520: Final Study Guide (Chapter 18, 19, 20, 21, 22, 23, 24, 25, 30)

View Set

Embryology Test 2 Practive from BRS

View Set

Online Homework 5- Organic Molecules

View Set

Skill in Context Journeys Unit 1 Week 3 My Librarian is a Camel

View Set

Unit 3: English Civil War Timeline

View Set

Hemostasis, Wound Healing, & Wound Closure, Ch 11, ST for the ST

View Set