Big Data Exam 2 (ISM3540)

¡Supera tus tareas y exámenes ahora con Quizwiz!

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand

how well visitors understand your products

In the Influence Health case study, what was the goal of the system?

increasing service use

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called

parsing the documents

Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by

removing identifiers such as names and social security numbers.

In the Wimbledon case study, the tournament used data for each match in real time to highlight

significant events

Companies with the largest revenues from Big Data tend to be

the largest computer and IT services firms.

Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

unrestricted, ungoverned sandbox explorations

What is the Hadoop Distributed File System (HDFS) designed to handle?

unstructured and semistructured non-relational data

You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data?

Build a website that validates data as the survey participant takes the survey.

In the Dell cases study, the largest issue was how to properly spend the online marketing budget.

False

In the cancer research case study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

False

In the car insurance case study, text mining was used to identify auto features that caused injuries.

False

In the evolution of social media user engagement, the largest recent change is the growth of creators.

False

In the opening case, police detectives used data mining to identify possible new areas of inquiry.

False

K-fold cross-validation is also called sliding estimation.

False

Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

False

Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica.

False

Search engines are only used in the context of the World Wide Web (WWW).

False

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.

False

Statistics and data mining both look for data sets that are as large as possible.

False

What do voice of the market (VOM) applications of sentiment analysis do?

They examine customer sentiment at the aggregate level.

Big Data is being driven by the exponential growth, availability, and use of information.

True

Categorization and clustering of documents during text mining differ only in the preselection of categories.

True

Converting continuous valued numerical variables to ranges and categories is referred to as discretization.

True

Current total storage capacity lags behind the digital information being generated in the world.

True

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment.

True

During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality.

True

Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.

True

If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

True

In data mining, classification models help in prediction.

True

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document.

True

In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.

True

Regional accents present challenges for natural language processing.

True

Social media mentions can be used to chart and predict flu outbreaks.

True

The cost of data storage has plummeted recently, making data mining feasible for more firms.

True

Which of the following is NOT one of the "3 V's of Big Data"

Veracity

Search engine optimization (SEO) is a means by which

Web site developers can increase Web site search rankings.

Web site usability may be rated poor if

Web site visitors download few of your offered PDFs and videos.

In text analysis, what is a lexicon?

a catalog of words, their synonyms, and their meanings

In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT

a core engine that could operate seamlessly in another domain without changes.

Natural language processing (NLP) is associated with which of the following areas?

all of these (text mining artificial intelligence computational linguistics)

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as

association rule mining.

What is the main reason parallel processing is sometimes used for data mining?

because of the massive data amounts and search efforts involved

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

clustering

In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?

determine differences in rates of disease in urban and rural populations

Which of the following is a data mining myth?

Data mining requires a separate, dedicated database.

Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

False

The Eckerson survey of 2002 estimated the total cost (to the US yearly economy) of dirty data to be approximately:

$600 Billion USD

A company/organization can encounter dirty data in the form of

All of the these ( invalid mailing address invalid email address duplicated data )

Which data mining process/methodology is thought to be the most comprehensive, according to kdnuggets.com rankings?

CRISP-DM

In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data.

FALSE

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

False


Conjuntos de estudio relacionados

Wiley's Security+ Practice Questions

View Set

MGMT 481: Ch. 1, Strategic Management Chapter 3, Ch 4 Strategic Management, Strategic Management- Chapter 2

View Set

Chapter 16: The Autonomic Nervous System

View Set

English Grade 10 Word of the Day

View Set

Social Studies Study Guide Chapters 19,20, and 21

View Set

Energy efficiency an alternate energy sources.

View Set

Numéro atomique - tableau périodique (1ère période - 3 ème période; alcalin, alcalino-terreux, halogène, gaz nobles)

View Set

19 - Title of Goods and Risk of Loss

View Set