CSE 160

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

The term "loss" is used across data science as a general term for error penalty.

True

Tree induction will keep growing the tree to fit the training data until it creates pure leaf nodes.

True

Understanding data science does not mean that you will be able to tell whether a data mining project will succeed.

True

Understanding data science is important to understand because dat analysis is so critical to business strategy, and because data analytics projects reach into all business units.

True

Using the list of U.S. states illustrates how a non-normal distribution has a normal sampling distribution of means.

True

We should study industries like online advertising for hints about big data and data science that subsequently will be adopted by other industries.

True

Tree induction has been a very popular data mining procedure for all reasons EXCEPT: a. It is always right b. It is computationally inexpensive c. It is easy to understand d. It is easy to implement

a.it is always right

Name the function in R that concatenates data elements together into a vector:

c()

What is the name for a list of vectors where each vector has the exact same number of elements as the others?

dataframe

The 25% of cases with the smallest value is known as the:

first quartile

What kind of graph or curve shows the accuracy of a model as a function of complexity?

fitting

A main purpose of creating _______ regions is so that we can predict the target variable of a new, unseen instance by determining which segment it falls into.

homogeneous

The situation in which a variable collected in historical data gives information on the target variable but is not actually available when the decision has to be made is called what?

leak

What is the name of the value that occurs most often in a sample of data?

mode

What is the term that Gauss used to describe the common bell-shaped distribution?

normal

The distinction between classification and regression is whether the target variable is categorical or

numeric

Entities that R creates and manipulates are called:

objects

Looking too hard at a set of data might result in finding something that does not generalize to unseen data. This is called what?

overfitting

The definition for ______ is homogeneous with respect to the target variable.

pure

The general method for reining in model complexity to avoid overfitting is called model

regularization

Name the R function that can repeat an activity. (Do not include parenthesis, just the name of the function)

replicate

The larger the sample size, the _________ the standard error.

smaller

Name the function in R that reveals the structure of a data object:

str()

Supervised learning is model creation where the model describes a relationship between a set of selected variables and a predefined variable called the

target variable

In certain fields of statistics and econometrics, the bare model with unspecified parameters is called:

the model

Data comes from the Latin word "datum", meaning:

thing given

A natural measure of impurity for numeric values is

variance

Data science can apply to the farming profession.

True

Data scientists play active roles in the four A's of data: data architecture, data acquisition, data analysis, and data archiving.

True

Decomposing a data analytics problem into recognized tasks is a critical skill.

True

Information gain measures how much an attribute improves entropy due to new information being added.

True

R is case-sensitive.

True

Walmart data miners found that strawberry pop tarts sell at ________ times their normal rate ahead of a hurricane.

7

A simplified* representa.on of reality created for a specific purpose

Model

Select the mathematician that did not work on the ideas of "the law of large numbers" and the central limit theorem.

Archimedes

The data mining procedure that produces a model that, given a new individual, determines the category to which that individual belongs:

Classification

The data mining procedure that attempts to find associations between entities based on transactions involving them:

Co-occurrence grouping

At a high level, data mining is a set of fundamental principles that guide the extraction of knowledge from data.

False

Cross-validation specifies a systematic way of splitting up a single dataset such that it generates one single performance measure.

False

Deduction is a term from philosophy that refers to generalizing from specific cases to general rules.

False

If you run a statistical process a large number of times, it does not converge on a stable result.

False

In data science, prediction means to forecast a future event.

False

In data science, the key to success is to "follow the money"

False

SVM stands for separate vector machines.

False

The SVM's objective function incorporates the idea that a thinner bar is better.

False

The primary purpose of descriptive modeling is to predict a future event.

False

There is only one manual online for R.

False

What is the actual last name of the person who invented the Student's t-Test?

Gosset

An instance represents a fact or a data point.

True

Which of the following is NOT necessarily part of the data mining process presented in class?

Interviewing potential customers

Name the function in R that returns the average:

Mean

If the "tail" on the high side is slightly longer than it should be, then we have a:

Rightward Skew

What is the name of the bank that spun out the Capital One credit card company?

Signet

The data mining procedure that produces a model that, given a new individual, finds those individuals that are most like the new individual:

Similarity Matching

When we analyze a set of data with knowledge of the correct prediction for each item, what kind of model are we building?

Supervised


Kaugnay na mga set ng pag-aaral

BLS for Health Care Providers Course Study Cards

View Set

"Polarizations and Cybercascades" by Cass Sunstein

View Set

Hardware and Network Troubleshooting

View Set

NUR 3420 Pharmacology PrepU Chapter 42

View Set

Marketing - 3.4 Pricing Strategies and Customer Value

View Set

CHAPTER 15: CHEMICAL EQUILIBRIUM

View Set