IDC 3931 Final

¡Supera tus tareas y exámenes ahora con Quizwiz!

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

False

Hadoop and MapReduce require each other to work.

False

In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is best to just replace these NULL values with the average of that column of data.

False

In most cases, Hadoop is used to replace data warehouses.

False

Ratio data is a type of categorical data.

False

Statistics and data mining both look for data sets that are as large as possible.

False

How does Hadoop work?

It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

Which of the following would NOT be considered Master Data?

Product delivery details

Which of the following sources is likely to produce Big Data the fastest?

RFID tags

What is one major way in which Web-based social media differs from traditional publishing media?

They have different costs to own and operate.

For low latency, interactive reports, a data warehouse is preferable to Hadoop.

True

Generally, making a search engine more efficient makes it less effective.

True

Hadoop was designed to handle petabytes and extabytes of data distributed over multiple nodes in parallel.

True

In data mining, classification models help in prediction.

True

Interval data is a type of numerical data.

True

Many analytics tools are too complex for the average user, and this is one justification for Big Data.

True

MapReduce can be easily understood by skilled programmers due to its procedural nature.

True

Regional accents present challenges for natural language processing.

True

The term "Big Data" is relative as it depends on the size of the using organization.

True

When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

True

Search engine optimization (SEO) is a means by which

Web site developers can increase Web site search rankings.

In text analysis, what is a lexicon?

a catalog of words, their synonyms, and their meanings

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

grid computing

Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near—real time with highly accurate insights. What is this process called?

in-memory analytics

A company/organization can encounter dirty data in the form of

invalid mailing address, invalid email address format, duplicated data

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called

parsing the documents.

All of the following statements about data mining are true EXCEPT

the process aspect means that data mining should be a one-step process to results.

The Eckerson survey of 2002 estimated the total cost (to the US yearly economy) of dirty data to be approximately:

$600 Billion USD

Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

False

Decentralization, the need for specialized skills, and immediacy of output are all attributes of Web publishing when compared to industrial publishing.

False

Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

False

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

True

Current total storage capacity lags behind the digital information being generated in the world.

True

It is important for Big Data and self-service business intelligence go hand in hand to get maximum value from analytics.

True

The cost of data storage has plummeted recently, making data mining feasible for more firms.

True

All of the following statements about data mining are true EXCEPT

building the model takes the most time and effort.

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?

classification

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

clustering

You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data?

Build a website that validates data as the survey participant takes the survey.

Big Data simplifies data governance issues, especially for global firms.

False

Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.

False

Using data to understand customers/clients and business operations to sustain and foster growth and profitability is

an increasingly challenging task for today's enterprises.

In text mining, tokenizing is the process of

categorizing a block of text in a sentence.

What data discovery process, whereby objects are categorized into predetermined groups, is used in text mining?

classification


Conjuntos de estudio relacionados

Wound Healing - Practice Questions

View Set

21- Compensation for Business Leaders

View Set

Introduction to Government - Assignment

View Set

General Psychology M10 (Ch. 15) Quiz Review

View Set

Chapter 12 AP Computer Science maybe

View Set

Artificial Airways Practice Questions

View Set

UNIT 8.4 - Polar Bonds and Molecules

View Set