AI Midterm exam review: Search + Machine Learning
How to determine how many clusters to use in K-Means?
- domain knowledge - minimize distortion (k=N --> distortion = 0) - minimize Schwarz criterion
Hill Climbing algorithm outline
1. Pick starting state s 2. Pick t in neighbors(s) with largest f(t) 3. if f(t) < f(s) stop, return s 4. s = t, GOTO 2
What is a continuous label?
A real value
A* search vs A search?
A* search is A search with admissible h()
How to measure performance in a classification task?
Accuracy or error rate
Define admissible
Admissible heuristic 0 <= h(s) <= h*(s)
KNN input
All training samples k Distance function testing sample x*
Result of HAC?
Binary tree (called something else) A hierarchy of data point groups
Supervised Learning common tasks
Classification (if label is discrete) Regression (if label is continuous)
Unsupervised Learning common tasks
Clustering (separate n instances into groups) Novelty detection (find instances which are very different from the rest) Dimensionality reduction (represent each instance w/ a lower dimension feature vector)
Simulated Annealing
Continue even when you don't find a better neighbor Less likely to get stuck in a local optima (still not garunteed not to)
How to get k clusters using HAC?
Cut the tree
What is the challenging part of Hill Climbing? What is a drawback of Hill Climbing?
Designing the neighborhood. It is easily stuck in a local optimum or plateau, very greedy
Disadvantages of KNN?
Heavy storage cost Heavy computation cost (the predictor of KNN is basically the whole training set)
HAC stands for
Hierarchial Agglomerative Clustering
Name two advanced search algorithms
Hill climbing Simulated Annealing
Name some variations of Hill Climbing algorithm
Hill climbing with random restarts Stochastic Hill Climbing First choice hill climbing WALKSAT Simulated Annealing
Is IDA* complete? Optimal?
IDA* is complete IDA* is optimal
Compare the cost of IDA* and A* search
IDA* is more costly than A*
How do you represent things in machine learning?
Instance x represents a specific thing x represented by feature vector x = (x1, ... xd)
K-Means is a coordinate descent problem... what does this mean?
It will find a local minimum. (multiple restarts might be required)
IDA* stands for?
Iteratively Deepening A* search
Supervised Learning Classification examples of algorithms
KNN SVM Decision Tree
Supervised Learning Regression examples of algorithms
Linear regression Decision tree?
IDA* search is...
Memory bounded search Don't expand nodes with f(n) > #
Examples of some Advanced Search problems?
N-Queen: f(s) = number of conflicting queens Traveling salesman - visit each city once & return to first. state = order visited; f(s) = total mileage
Does the path matter for Advanced Search?
No - you can't enumerate states
A search Which node expanded first?
Node with least g(s) + h(s)
Best First greedy search Which node expanded first?
Node with least h(s) first
A* search with admissible h() is guaranteed to find...?
Optimal path
What is the goal of Advanced Search?
Optimization Problem Goal: find the state with the highest 'score' f(s) or a reasonably high score
What problem does advanced search seek to solve, in general?
Optimization problem
Simulated Annealing algorithm:
Pick starting state s Randomly pick t in neighbors(s) If f(t) > f(s): accept s=t else: accept s=t with a small probability p
A search data structure?
Priority Queue
Best First greedy search data structure?
Priority Queue
Beam Search basics:
Puts a limit on the amount of memory to use. Only top k nodes kept in priority queue Or, Keep only nodes at most e worse than the best node in the queue e = beam width
Data structure used by BFS?
Queue
First choice hill climbing
Randomly generate neighbors one-by-one If better, choose it; it not, generate another random neighbor It sometimes works, sometimes does not - due to luck
Stochastic Hill Climbing
Randomly select next state from among better neighbors, the better the more likely. Will never find local optima or global optima Neighborhood might be too large to enumerate
Which node does BFS search choose to expand?
Shallowest node (node closest to the root)
Drunk rabbit does...
Simulated annealing, stoichastic hill climbing, first choice hill climbing
HAC how to define closest groups?
Single-linkage: shortest distance from any one point in a group to another point in the other group Complete-linkage: greatest distance from any one point in a group to another point in the other group Average-linkage: average distance from all points in a group to all points in the other group
How to choose k for the KNN algorithm?
Split data into training & tuning sets Classify tuning set with different k Pick k that produces the least tuning set error
Uninformed Search What information do you know?
The goal test + successor function
SVM what is a linear SVM?
The linear classifier with the maximum margin
What is the margin?
The width the decision boundary can be increased to before touching a data point
Steps of a decision tree classification task
Training data set given Learn model using tree induction algorithm --> model = decision tree Apply model to the test data set
Unsupervised Learning input
Training sample set
HAC what type of algorithm
Unsupervised learning
Sober rabbit does...
Vanilla hill climbing (reaches local max and stops)
Does the path matter for Informed Search?
Yes
Will K-Means stop?
Yes, there are finite points
The father the h(n) is from h*(n) in A* search, leads to...
expands more nodes; A* search slower
KNN algorithm
find the k training instances xi, ... xik closest to x* output y* as the majority class/label of xi,... xik
Informed Search What information do you know?
g(s) = cost from start state to state s h(s) = estimation of the cost from state s to the goal state (heuristic)
KNN output
label of the testing sample
Supervised learning input
labeled training sample set
WALKSAT
only applies to 3 set boolean algorithms IDEA: sometimes you must step backwards
Options for defining probability p for simulated annealing
p = 0.1 p decreases with time p decreases with time and as 'badness' increases p = Boltzmann distribution