ISM3540 Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

Clustering

Which of the following is a data mining myth?

Data mining requires a separate, dedicated database.

Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security. (T/F)

False

Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. (T/F)

False

In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data. (T/F)

False

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. (T/F)

False

In the Dell cases study, the largest issue was how to properly spend the online marketing budget. (T/F)

False

In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime. (T/F)

False

In the Salesforce case study, streaming data is used to identify services that customers use most. (T/F)

False

In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals. (T/F)

False

In the car insurance case study, text mining was used to identify auto features that caused injuries. (T/F)

False

In the evolution of social media user engagement, the largest recent change is the growth of creators. (T/F)

False

In the opening case, police detectives used data mining to identify possible new areas of inquiry. (T/F)

False

K-fold cross-validation is also called sliding estimation. (T/F)

False

Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance. (T/F)

False

Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica. (T/F)

False

Ratio data is a type of categorical data. (T/F)

False

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. (T/F)

False

Search engines are only used in the context of the World Wide Web (WWW). (T/F)

False

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. (T/F)

False

Statistics and data mining both look for data sets that are as large as possible. (T/F)

False

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. (T/F)

False

The entire focus of the predictive analytics system in the Infinity P&C case was on detecting and handling fraudulent claims for the company's benefit. (T/F)

False

Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?

In-memory analytics

In a Hadoop "stack," what node periodically replicates and stores data from the Name Node should it fail?

Secondary node

Clustering partitions a collection of things into segments whose members share

Similar characteristics

In the research literature case study, the researchers analyzing academic papers extracted information from which source?

The paper abstract

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. (T/F)

True

Big Data is being driven by the exponential growth, availability, and use of information. (T/F)

True

Categorization and clustering of documents during text mining differ only in the preselection of categories. (T/F)

True

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. (T/F)

True

Converting continuous valued numerical variables to ranges and categories is referred to as discretization. (T/F)

True

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. (T/F)

True

During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. (T/F)

True

For low latency, interactive reports, a data warehouse is preferable to Hadoop. (T/F)

True

Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel. (T/F)

True

If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse. (T/F)

True

In data mining, classification models help in prediction. (T/F)

True

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. (T/F)

True

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. (T/F)

True

In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. (T/F)

True

In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service. (T/F)

True

It is important for Big Data and self-service business intelligence to go hand in hand to get maximum value from analytics. (T/F)

True

Regional accents present challenges for natural language processing. (T/F)

True

The cost of data storage has plummeted recently, making data mining feasible for more firms. (T/F)

True

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering. (T/F)

True

When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach. (T/F)

True

Which of the following statements about Web site conversion statistics is FALSE?

Visitors who begin a purchase on most Web sites must complete it.

Search engine optimization (SEO) is a means by which

Web site developers can increase Web site search rankings.

What does the robustness of a data mining method refer to?

its ability to overcome noisy data to make somewhat accurate predictions

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as

Association rule mining

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. (T/F)

False

In a Hadoop "stack," what is a slave node?

a node where data is stored and processed

The Eckerson survey of 2002 estimated the total cost (to the US yearly economy) of dirty data to be approximately: a. $600 Billion USD b. $600 Million USD c. $6 Billion USD d. $600 Trillion USD

a. $600 Billion USD

You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data? a. Build a website that validates data as the survey participant takes the survey. b. Let your friend throw a survey site together that accumulates the data and you export it into a spreadsheet and fix the data manually. c. Have the survey site email you when it encounters data that is not formatted correctly. d. Have the survey accumulate the data and then email the survey participant after the survey is processed asking them to retake it due to invalid data.

a. Build a website that validates data as the survey participant takes the survey.

In text analysis, what is a lexicon? a. a catalog of words, their synonyms, and their meanings b. a catalog of customers, their words, and phrases c. a catalog of letters, words, phrases, and sentences d. a catalog of customers, products, words, and phrases

a. a catalog of words, their synonyms, and their meanings

A company/organization can encounter dirty data in the form of a. all of these b. invalid mailing address c. invalid email address d. duplicated data

a. all of these

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications? a. insurance b. retailing and logistics c. customer relationship management d. computer hardware and software

a. insurance

The data field "ethnic group" can be best described as a. nominal data b. interval data c. ordinal data d. ratio data

a. nominal data

All of the following statements about data mining are true EXCEPT a. the process aspect means that data mining should be a one-step process to results. b. the novel aspect means that previously unknown patterns are discovered. c. the potentially useful aspect means that results should lead to some business benefit. d. the valid aspect means that the discovered patterns should hold true on new data.

a. the process aspect means that data mining should be a one-step process to results.

What does Web content mining involve?

analyzing the unstructured content of Web pages

Understanding customers better has helped Amazon and others become more successful. The understanding comes primarily from

analyzing the vast data amounts routinely collected.

In sentiment analysis, which of the following is an implicit opinion? a. the hotel we stayed in was terrible. b. The customer service I got for my TV was laughable. c. The cruise we went on last summer was a disaster. d. Our new mayor is great for the city.

b. The customer service I got for my TV was laughable.

Prediction problems where the variables have numeric values are most accurately defined as a. classifications b. regressions c. associations d. computations

b. regressions

Web site usability may be rated poor if a. the average number of page views on your Web site is large. b. the time spent on your Web site is long. c. Web site visitors download few of your offered PDFs and videos. d. users fail to click on all pages equally.

c. Web site visitors download few of your offered PDFs and videos.

In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT a. massive parallelism to enable simultaneous consideration of multiple hypotheses. b. an underlying confidence subsystem that ranks and integrates answers. c. a core engine that could operate seamlessly in another domain without changes. d. integration of shallow and deep knowledge.

c. a core engine that could operate seamlessly in another domain without changes.

What is the main reason parallel processing is sometimes used for data mining? a. because the hardware exists in most organizations, and it is available to use b. because most of the algorithms used for data mining require it c. because of the massive data amounts and search efforts involved d. because any strategic application requires parallel processing

c. because of the massive data amounts and search efforts involved

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes? a. associations b. visualization c. classification d. clustering

c. classification

What are the two main types of Web analytics? a. old-school and new-school Web analytics b. Bing and Google Web analytics c. off-site and on-site Web analytics d. data-based and subjective Web analytics

c. off-site and on-site Web analytics

Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by a. asking data users to use the data ethically. b. leaving in identifiers (e.g., name), but changing other variables. c. removing identifiers such as names and social security numbers. d. letting individuals in the data know their data is being accessed.

c. removing identifiers such as names and social security numbers.

In the Wimbledon case study, the tournament used data for each match in real time to highlight a. winners and losers b. player histories c. significant events d. advertiser content

c. significant events

All of the following statements about data mining are true EXCEPT: a. the term is relatively new b. its techniques have their roots in traditional statistical analysis and artificial intelligence. c. the ideas behind it are relatively new. d. intense, global competition make its application more important.

c. the ideas behind it are relatively new.

In text mining, tokenizing is the process of

categorizing a block of text in a sentence.

Natural language processing (NLP) is associated with which of the following areas? a. text mining b. artificial intelligence c. computation linguistics d. all of these

d. all of these

A data mining study is specific to addressing a well-defined business task, and different business tasks require a. general organizational data b. general industry data c. general economic data d. different sets of data

d. different sets of data

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand a. the hardware your Web site is running on. b. the type of Web browser being used by your Web site visitors. c. most of your Web site visitors' wants and needs. d. how well visitors understand your products

d. how well visitors understand your products

In the Influence Health case study, what was the goal of the system? a. locating clinic patients b. understanding follow-up care c. decreasing operational costs d. increasing service use

d. increasing service use


Conjuntos de estudio relacionados

MS 2 - Exam 3 Practice Questions: End of Neuro, Urinary, Renal

View Set

Chapter 1, 2: Agency relationships & issues Questions

View Set

ECON 101 MIDTERM 3 PRACTICE QUESTIONS

View Set

Management Accounting - BUDGETING

View Set

Chapter 50: Concepts of Care for of Patients With Stomach Disorders

View Set

Chapter 14 - The Organization of International Business

View Set

We Beat The Street/ Context and Connotation

View Set

20.századi politikatörténet események/személyek/fogalmak

View Set

Certified Clinical Supervisor exam -D.J Powell

View Set