AI Midterm exam review: Search + Machine Learning

¡Supera tus tareas y exámenes ahora con Quizwiz!

How to determine how many clusters to use in K-Means?

- domain knowledge - minimize distortion (k=N --> distortion = 0) - minimize Schwarz criterion

Hill Climbing algorithm outline

1. Pick starting state s 2. Pick t in neighbors(s) with largest f(t) 3. if f(t) < f(s) stop, return s 4. s = t, GOTO 2

What is a continuous label?

A real value

A* search vs A search?

A* search is A search with admissible h()

How to measure performance in a classification task?

Accuracy or error rate

Define admissible

Admissible heuristic 0 <= h(s) <= h*(s)

KNN input

All training samples k Distance function testing sample x*

Result of HAC?

Binary tree (called something else) A hierarchy of data point groups

Supervised Learning common tasks

Classification (if label is discrete) Regression (if label is continuous)

Unsupervised Learning common tasks

Clustering (separate n instances into groups) Novelty detection (find instances which are very different from the rest) Dimensionality reduction (represent each instance w/ a lower dimension feature vector)

Simulated Annealing

Continue even when you don't find a better neighbor Less likely to get stuck in a local optima (still not garunteed not to)

How to get k clusters using HAC?

Cut the tree

What is the challenging part of Hill Climbing? What is a drawback of Hill Climbing?

Designing the neighborhood. It is easily stuck in a local optimum or plateau, very greedy

Disadvantages of KNN?

Heavy storage cost Heavy computation cost (the predictor of KNN is basically the whole training set)

HAC stands for

Hierarchial Agglomerative Clustering

Name two advanced search algorithms

Hill climbing Simulated Annealing

Name some variations of Hill Climbing algorithm

Hill climbing with random restarts Stochastic Hill Climbing First choice hill climbing WALKSAT Simulated Annealing

Is IDA* complete? Optimal?

IDA* is complete IDA* is optimal

Compare the cost of IDA* and A* search

IDA* is more costly than A*

How do you represent things in machine learning?

Instance x represents a specific thing x represented by feature vector x = (x1, ... xd)

K-Means is a coordinate descent problem... what does this mean?

It will find a local minimum. (multiple restarts might be required)

IDA* stands for?

Iteratively Deepening A* search

Supervised Learning Classification examples of algorithms

KNN SVM Decision Tree

Supervised Learning Regression examples of algorithms

Linear regression Decision tree?

IDA* search is...

Memory bounded search Don't expand nodes with f(n) > #

Examples of some Advanced Search problems?

N-Queen: f(s) = number of conflicting queens Traveling salesman - visit each city once & return to first. state = order visited; f(s) = total mileage

Does the path matter for Advanced Search?

No - you can't enumerate states

A search Which node expanded first?

Node with least g(s) + h(s)

Best First greedy search Which node expanded first?

Node with least h(s) first

A* search with admissible h() is guaranteed to find...?

Optimal path

What is the goal of Advanced Search?

Optimization Problem Goal: find the state with the highest 'score' f(s) or a reasonably high score

What problem does advanced search seek to solve, in general?

Optimization problem

Simulated Annealing algorithm:

Pick starting state s Randomly pick t in neighbors(s) If f(t) > f(s): accept s=t else: accept s=t with a small probability p

A search data structure?

Priority Queue

Best First greedy search data structure?

Priority Queue

Beam Search basics:

Puts a limit on the amount of memory to use. Only top k nodes kept in priority queue Or, Keep only nodes at most e worse than the best node in the queue e = beam width

Data structure used by BFS?

Queue

First choice hill climbing

Randomly generate neighbors one-by-one If better, choose it; it not, generate another random neighbor It sometimes works, sometimes does not - due to luck

Stochastic Hill Climbing

Randomly select next state from among better neighbors, the better the more likely. Will never find local optima or global optima Neighborhood might be too large to enumerate

Which node does BFS search choose to expand?

Shallowest node (node closest to the root)

Drunk rabbit does...

Simulated annealing, stoichastic hill climbing, first choice hill climbing

HAC how to define closest groups?

Single-linkage: shortest distance from any one point in a group to another point in the other group Complete-linkage: greatest distance from any one point in a group to another point in the other group Average-linkage: average distance from all points in a group to all points in the other group

How to choose k for the KNN algorithm?

Split data into training & tuning sets Classify tuning set with different k Pick k that produces the least tuning set error

Uninformed Search What information do you know?

The goal test + successor function

SVM what is a linear SVM?

The linear classifier with the maximum margin

What is the margin?

The width the decision boundary can be increased to before touching a data point

Steps of a decision tree classification task

Training data set given Learn model using tree induction algorithm --> model = decision tree Apply model to the test data set

Unsupervised Learning input

Training sample set

HAC what type of algorithm

Unsupervised learning

Sober rabbit does...

Vanilla hill climbing (reaches local max and stops)

Does the path matter for Informed Search?

Yes

Will K-Means stop?

Yes, there are finite points

The father the h(n) is from h*(n) in A* search, leads to...

expands more nodes; A* search slower

KNN algorithm

find the k training instances xi, ... xik closest to x* output y* as the majority class/label of xi,... xik

Informed Search What information do you know?

g(s) = cost from start state to state s h(s) = estimation of the cost from state s to the goal state (heuristic)

KNN output

label of the testing sample

Supervised learning input

labeled training sample set

WALKSAT

only applies to 3 set boolean algorithms IDEA: sometimes you must step backwards

Options for defining probability p for simulated annealing

p = 0.1 p decreases with time p decreases with time and as 'badness' increases p = Boltzmann distribution


Conjuntos de estudio relacionados

Public Goods & Common Pool Resources

View Set

Special Senses: Vision - Structures and Function

View Set

History: First 3 Presidents video

View Set

Homonymus Hemianopia/Hemianopsia & Blindsight Rehabilitation + Neural Plasticity

View Set

Catcher in the Rye Chapter 13-15 vocabulary

View Set

Chapter 31 - The Nurse in Schools

View Set