ML exam

Ace your homework & exams now with Quizwiz!

Suppose a single perceptron with three inputs x1, x2, and x3. Can this perceptron be trained (or assigned weights and a threshold) to learn the functions y1 and/or y2 below? y1 = At least two out of three of x1, x2, and x3. y2 = Exactly two out of three of x1, x2, and x3.

A single perceptron can learn only y1

Ensemble Learning

•Combine simple (ineffective) rules into one effective complex rule

Bagging & Boosting

•Combining simple learners into complex learners are very effective

We say that a learning problem is _____ if the hypothesis space contains the true function

realizable

hypothesis space

set of all possible hypotheses

Training Set

subsection of a dataset from which the machine learning algorithm uncovers or "learns" relationships between the features and the target variable

supervised learning

the agent observes some example input-output pairs and learns a function that maps from input to output

feature selection

the process of selecting attributes which are most predictive of the class we are predicting

Bagging

•Generate K data sets by sampling with replacement from the original training set, each of size N •Run our learning algorithm on each individual data set •We now have K learners/hypotheses •Run new inputs through all of the learners •Classification: Take majority vote across all K learners •Regression: Take mean across all K learners

Random Forest

•Specifically for decision trees •With bagging, we get a lot of similar trees because information gain will be similar •Random forest allows for a more diverse collection of trees •For K trees: •At each split, pick a random subset of attributes. Choose the attribute from that subset with the highest information gain. •Take majority vote across trees •Resistant to overfitting, no need for pruning

VC dimension

•The largest set of inputs that the hypothesis class can shatter (label in all possible combinations)

Which of these best describes what Haussler's theorem provides?

A lower bound on the number of samples required for a learner to PAC-learn the hypothesis space.

Supervised learning is best described as

learning an approximation of a function from known inputs to known outputs, so that it can be applied to new inputs

Eager Learners

Decision Tree and Neural Networks

An attribute should never appear in a decision tree more than once.

False

When we have no domain knowledge about a classification problem, which method is considered the best first algorithm to employ?

Random forests

Regression

When y is a number (such as tomorrow's temperature), the learning problem is called regression

Boosting

Wrong answers get more weight and vise versa

model selection

model selection defines the hypothesis space and then optimization finds the best hypothesis within that space.

Consider a learning problem where each instance has 3 inputs and a single output. All inputs and outputs are continuous. How many columns would be in the X matrix (representing the inputs) when trying to find the best fit quadratic (degree 2) function?

7

entropy

A measure of disorder or randomness.

Which of the following best describes the process of building a single tree during the random forest approach?

At each new node, select a subset of features to consider.

In class, we have discussed the preference and restriction bias of many algorithms we have discussed. For example, we discussed that our algorithm prefers shorter decision trees, and we've discussed that a neural network with one hidden layer is restricted to representing continuous functions (and therefore cannot represent discontinuous functions). We did not discuss the preference and restriction bias of boosting. What are the biases of boosting?

Boosting's biases are whatever the biases of the underlying learners are. (If we use decision trees, we inherit the biases of decision trees. If we use SVMs, we inherit the biases of SVMs, and so on.)

decision tree pruning

Combats overfitting by eliminating nodes that are not clearly relevant.

validation set

subsection of a dataset to which we apply the machine learning algorithm to see how accurately it identifies relationships between the known outcomes for the target variable and the dataset's other features

Suppose that I run a business, where the sales amounts change drastically day-to-day. Which of the following is the best reason that I might use a decision tree (rather than other supervised learning techniques we have covered) to predict how much my customers will spend? (Select all answers that apply.)

Decision trees will ignore potentially irrelevant features.

Sample complexity is the only valuable measure for comparing the complexity of training different learners.

False

When the size of the hypothesis space is infinite, an infinite number of samples are required to PAC-learn the hypothesis space.

False

When you find multiple neighbors that are the same

Give both

Lazy learner

KNN

When to KNN

Lots of training with few features. however when more features, amount of data increases drastically (curse of dimensionality)

What is the purpose of the kernel trick?

Make data linearly separable by mapping the data to a higher dimension.

Which of the following best describes the calculation of error for linear regression?

The sum of the vertical distances between each data point and the line of best fit

Issues maybe present in decision trees

Missing data, Multivalued attributes, Continuous and integer-valued input attributes (gives infinity branches, instead find spilt point that gives the highest information gain) and Continuous-valued output attributes (More applicable in regression)

Suppose that I run a business, where the sales amounts change drastically day-to-day. Which of the following is the best reason that I might use a neural network (rather than other supervised learning techniques we have covered) to predict how much my customers will spend? (Select all answers that apply.)

Neural networks are a better match for regression problems.

reinforcement learning

Perform an action then learn based on a reward or punishment

KNN with classification do

Plurality vote

KNN with regression

Return mean y1

stationarity assumption

States that transition probabilities do not change with time

overfitting

The process of fitting a model too closely to the training data for the model to be effective on other data.

Which of the following best describes bagging?

Train n learners, with each one trained on a subset of the data (which may overlap).

Which of these is NOT a bias of decision trees? Trees with high information gain attributes at the root are preferred Correct trees are preferred Shorter trees are preferred Trees that use all attributes are preferred

Trees that use all attributes are preferred

A hypothesis space H is PAC-learnable if and only if the VC dimension is finite

True

information gain

a measure of the predictive power of one or more attributes

regularization

his process of explicitly penalizing complex hypotheses is called regularization (because it looks for a function that is more regular, or less complex).

neural networks

interconnected neural cells. With experience, networks can learn, as feedback strengthens or inhibits connections that produce certain results. Computer simulations of neural networks show analogous learning.

Ockham's razor

prefer the simplest hypothesis consistent with the data


Related study sets

Sentences: Active and Passive Voices

View Set

Traps, Lats, Levator Scapulae, Rhomboids, Serratus Anterior

View Set

Race/Ethnicity-Healey- Final Exam 3

View Set

Nclex Review: Urinary Incontinence

View Set

Chapter 15: Acute Respiratory Failure

View Set

Chapter 23 Integ Assessment and Chapter skin cancer

View Set

Біологія та екологія - Вступ, біорізноманіття та біосистематика

View Set