ISM 3540 Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

The portion of IoT technology infrastructure that focuses on the sensors themselves is

hardware

A company/organization can encounter dirty data in the form of

invalid email address invalid mailing address duplicated data All of the these

What are the two main types of Web analytics?

off-site and on-site Web analytics

You are tasked with accumulating survey data on a web page and are responsible for it being free from dirty data once you close the survey and get the data to the researching team. Which is the best way to handle the possibility of dirty data?

Build a website that validates data as the survey participant takes the survey.

In most cases, Hadoop is used to replace data warehouses.

False

In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings.

False

In the Dell cases study, the largest issue was how to properly spend the online marketing budget.

False

In the Miami-Dade Police Department case study, predictive analytics helped to identify the best schedule for officers in order to pay the least overtime.

False

In the cancer research study, data mining algorithms that predict cancer survivability with high predictive power are good replacements for medical professionals.

False

In the car insurance case study, text mining was used to identify auto features that caused injuries.

False

In the evolution of social media user engagement, the largest recent change is the growth of creators.

False

In the opening case, police detectives used data mining to identify possible new areas of inquiry.

False

K-fold cross-validation is also called sliding estimation.

False

Market basket analysis is a useful and entertaining way to explain data mining to a technologically less savvy audience, but it has little business significance.

False

Open-source data mining tools include applications such as IBM SPSS Modeler and Dell Statistica. -

False

Ratio data is a type of categorical data.

False

SaaS combines aspects of cloud computing with Big Data analytics and empowers data scientists and analysts by allowing them to access centrally managed information data sets. -

False

Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters.

False

Search engines are only used in the context of the World Wide Web (WWW).

False

Server virtualization is the pooling of physical storage from multiple network storage devices into a single storage device.

False

Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors.

False

Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining.

False

Users definitely owned their biometric data.

False

Web-based media has nearly identical cost and scale structures as traditional media.

False

Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

Grid Computing

Understanding which keywords your users enter to reach your Web site through a search engine can help you understand

How well visitors understand your products.

In this model, infrastructure resources like networks, storage, servers, and other computing resources are provided to client companies.

IaaS

Identifying and preventing incorrect claim payments and fraudulent activities falls under which type of data mining applications?

Insurance

How does Hadoop work?

It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

What does the scalability of a data mining method refer to?

Its ability to construct a prediction model efficiently given a large amount of data

What does the robustness of a data mining method refer to?

Its ability to overcome noisy data to make somewhat accurate predictions

The data field "ethnic group" can be best described as:

Nominal data

In the Twitter case study, how did influential users support their tweets?

Objective Data

Which of the following allows companies to deploy their software and applications in the cloud so that their customers can use them?

PaaS

Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called -

Parsing the documents

Which of the following sources is likely to produce Big Data the fastest

RFID tags

Prediction problems where the variables have numeric values are most accurately defined as?

Regressions

Third party providers of publicly available data sets protect the anonymity of the individuals in the data set primarily by

Removing identifiers such as names and social security numbers.

This model allows consumers to use applications and software that run on distant computers in the cloud infrastructure.

SaaS

Clustering partitions a collection of things into segments whose members share -

Similar Characteristics

The portion of the loT technology infrastructure that focuses on how to manage incoming data and analyze it is?

Software Backend

In the Target case study, why did Target send a teen maternity ads?

Target's analytic model suggested she was pregnant based on her buying habits.

Natural language processing (NLP) is associated with which of the following areas?

Text mining Artificial Intelligence Computational linguistics All of these

In sentiment analysis, which of the following is an implicit opinion?

The customer service I got for my TV was laughable.

All of the following statements about data mining are true EXCEPT:

The ideas behind it are relatively new

In the research literature case study, the researchers analyzing academic papers extracted information from which source?

The paper abstract

All of the following statements about data mining are true EXCEPT

The process aspect means that data mining should be a one-step process to results.

In estimating the accuracy of data mining (or other) classification models, the true positive rate is:

The ratio of correctly classified positives divided by the total positive count.

Traditional data warehouses have not been able to keep up with -

The variety and complexity of data

What do voice of the market (VOM) applications of sentiment analysis do?

They examine customer sentiment at the aggregate level.

Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out.

True

Big data is being driven by the exponential growth, availability, and use of information. -

True

Categorization and clustering of documents during text mining differ only in the preselection of categories.

True

Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences.

True

Converting continuous-valued numerical variables to ranges and categories is referred to as discretization.

True

Current total storage capacity lags behind the digital information being generated in the world.

True

Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment

True

Data as a service began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network.

True

Despite their potential, many current NoSQL tools lack mature management and monitoring tools.

True

During classification in data mining, a false positive is an occurrence classified as true by the algorithm while being false in reality. -

True

For low latency, interactive reports, a data warehouse is preferable to Hadoop.

True

From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of data can be used to uncover trends, meaning, and relationships to eventually produce human-understandable representations.

True

Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel

True

If using a mining analogy, "knowledge mining" would be a more appropriate term than "data mining."

True

If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.

True

In data mining, classification models help in prediction.

True

In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way.

True

In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document or sample.

True

Internet of Things (IoT) is the phenomenon of connecting the physical world to the Internet.

True

It is important for big data and self service business intelligence to go hand-in-hand to get maximum value from analytics -

True

MapReduce can be easily understood by skilled programmers due to its procedural nature.

True

One reason the IoT is growing exponentially is because hardware is smaller and more affordable.

True

Regional accents present challenges for natural language processing.

True

Satellite data can be used to evaluate the activity at retail locations as a source of alternative data -

True

Service-oriented DSS solutions generally offer individual or bundled services to the user as a service.

True

Social media mentions can be used to chart and predict flu outbreaks.

True

The cost of data storage has plummeted recently, making data mining feasible for more firms.

True

The quality and objectivity of information disseminated by influential users of Twitter is higher than that disseminated by noninfluential users.

True

The term "Big Data" is relative as it depends on the size of the using organization.

True

Using data mining on data about imports and exports can help to detect tax avoidance and money laundering.

True

When a problem has many attributes that impact the classification of different patterns, decision trees may be a useful approach.

True

Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

Unrestricted, ungoverned sandbox explorations

Which of the following is not one of the "3 V's of big data"

Veracity

Which of the following statements about Web site conversion statistics is FALSE?

Visitors who begin a purchase on most Web sites must complete it

Search engine optimization (SEO) is a means by which

Web site developers can increase Website search rankings.

Web site usability may be rated poor if

Web site visitors download few of your offered PDFs and videos.

In data mining, finding an affinity of two products to be commonly together in a shopping cart is known as -

association rule mining

Hadoop and MaReduce require each other to work.

False

In text mining, tokenizing is the process of.

categorizing a block of text in a sentence.

Data mining requires specialized data analysts to ask ad hoc questions and obtain answers quickly from the system.

False

Data that is collected, stored, and analyzed in data mining is often private and personal. There is no way to maintain individuals' privacy other than being very careful about physical data security.

False

Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments.

False

In a dataset where all values on an observation are supposed to be populated you encounter several which are empty (NULL). It is always best to replace these NULL values with the average of that column of data

False

Statistics and data mining both look for data sets that are as large as possible.

False

There is a clear difference between the type of information support provided by influential users versus the others on Twitter. -

True

What is the Hadoop Distributed File System (HDFS) designed to handle?

Unstructured and semistructured non-relational data

Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?

Variability

Data mining can be very useful in detecting patterns such as credit card fraud, but is of little help in improving sales.

False

IaaS helps provide faster information, but provides information only to managers in an organization.

False

The Survey of 2017 estimated the total cost (to the US yearly economy) of dirty data to be approximately:

$3.1 Trillion USD

A newly popular unit of data in the big data era is the petabyte (PB), which is -

10^15 bytes

What is Big Data's relationship to the cloud?

Amazon and Google have working Hadoop cloud offerings

What does Web content mining involve?

Analyzing the unstructured content of Web pages

The portion of the IoT technology infrastructure that focuses on controlling what and how information is captured is:

Applications

What is the main reason parallel processing is sometimes used for data mining?

Because of the massive data amounts and search efforts involved

As discussed in class, which data mining process/methodology is the most widely-used and generally regarded as the most comprehensive?

CRISP-DM

Which broad area of data mining applications analyzes data, forming rules to distinguish between defined classes?

Classification

Which broad area of data mining applications partitions a collection of objects into natural groupings with similar features?

Clustering

The portion of the loT technology infrastructure that focuses on how to transmit data is?

Connectivity

GPS navigation is an example of which kind of location-based analytics?

Consumer-oriented geospatial static approach

Why are companies like IBM shifting to provide more services and consulting?

Customers see that significant value can be created with the application of analytics, and need help completing these tasks

This model began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network.

DaaS

Which of the following is a data mining myth? -

Data mining requires a separate, dedicated database

In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal?

Determine differences in rates of disease in urban and rural populations

Which of these is NOT a part of the IoT technology infrastructure?

Electrical access

Big Data simplifies data governance issues (like who owns the data or who is in charge of it), especially for global firms.

False

Big data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.

False

Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations.

False

Connectivity is not a part of the IoT infrastructure.

False

Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing.

False

In text analysis, what is a lexicon?

A catalog of words, their synonyms, and their meanings

In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT -

A core engine that could operate seamlessly in another domain without changes.

In a Hadoop "stack," what is a slave node? -

A node where data is stored and processed


Conjuntos de estudio relacionados

Pedagogy of the Oppressed (Paulo Freire)

View Set

Chapter 4- Life Policy Provisions and Options

View Set

Compensations - Quiz 5 Chapter 7

View Set

Sherpath chapter 15 Interpretation of Fetal and Uterine Monitoring

View Set

Seizures, Autonomic and Neurodegenerative Disorders

View Set