CS 343 AI Exam #1
How do you learn by problem solving in ML?
- Basic idea is parameter adjustment - Change parameters to figure out what works and what doesn't - Problem with this is credit assignment; hard to know what changes in which features are responsible for the outcome
How do you learn by discovery in ML?
- Create new knowledge, beyond what humans know - AM was trying to discover number theory - They gave it rules that defined what was interesting - Works well when the semantics depend on the syntax (such as in number theory) but mostly useless in other domains
DFS
- Explores deepest paths first - Maintain an OPEN & CLOSED list - The OPEN list is a stack (LIFO) - Not necessarily optimal solution Advantages - Low storage requirement - don't need to keep track of failed paths Disadvantages - Could go down the left-most path forever
BFS
- Explores shallowest paths first - Maintain an OPEN & CLOSED list - The OPEN list is a queue (FIFO) - Optimal search since we consider shallow paths first Advantages - Will never get trapped exploring the useless path forever - If there is more than one solution then BFS can find the minimal one that requires less number of steps. Disadvantages - High storage requirement - you cannot throw away previous states since this keeps track of how you re-create the path from the root to the terminal node
Convolutional NN
- Feedforward NN - Mainly used for vision - Does not have vanishing gradients problem - Point of convolution is to map the same small array many times -- Allows you to identify patterns
Autencoder NN
- Feedforward NN - Network with bottleneck layer - Used to do preprocessing for internal representations that other networks can then consume
Multilayer Perceptron NN
- Feedforward NN - Uses a nonlinear activation function Problems - Easy to overfit - Vanishing gradients
How do you learn from macros in ML?
- Find out which macro-operators are useful - You want generalize MACROPS that can replace constants with variables - Problem with this is the utility problem; macros take up space so you attach a utility value to each macro. Only keep the good ones
Block-Serializable Sub-Goals
- Grouping multiple sub-goals to make it serializable - Maybe it is possible to formalize the process of coming up with serializable solutions to problems a. General Problem Solver (GPS) -- Interviewed a bunch of people on how they solve problems and formalized that strategy -- Algorithm is to find operators to get to the goal, and then recursively call itself on the unsatisfied preconditions
DFID (Depth First Iterative Deepening)
- Guess the solution depth d and perform DFS with a bound of d - Overestimating d means it takes a long time to get to the solution - Underestimating d means we will never get to the solution - Instead we can iteratively increase the cutoff d by one and re-do DFS - Solution is optimal since we consider nodes at level n before n+1 Advantages - Low storage requirement - we only keep the direct ancestors in memory Disadvantages - We must perform DFS search d times. However it's actually not that expensive, only the final search will cost us anything
Sub-Goaling
- If you have a sub-goal, you can search from your current state to the sub-goal and then restart your search from the sub-goal to the goal state which reduces the overall search space - You can have multiple sub-goals -- The most important thing is the longest distance between any two sub-goals - It's possible that your sub-goals are all across the space and your solution is poor, but it makes sense that as humans we have an idea of what constitutes progress so that probably won't happen
Independent Sub-Goals
- If you have many independent sub-goals, do things separately not together. This reduces the branching factor
Reinforcement Learning
- In between supervised and unsupervised learning - Used in game-playing where you are unsure of what the optimal action is - No gradient information; does random exploration to see what works - Credit assignment problem is the core of RL: how do we know which actions were good/bad?
Q-Learning
- Incremental dynamic programming - Q values is expected utility of taking action X - Stored in Q-value table - Deep Q-learning is building a deep learning network to estimate the Q-values Downsides: - Need a huge table of Q-values -- Challenge is to approximate a large set of Q values with some kind of function - Adaptation problem - it cannot adapt back; once it decides something is not worth exploring, it won't really go back to it
What is the growth rate for a search tree?
- It is exponential - There are b^d nodes at level d, where b is the branching factor and d is the solution depth
How do you learn by explanation in ML & what are some problems with this approach?
- Learn from a single example with a lot of knowledge - "Strong" approach; uses intrinsic knowledge about the object - Goal is to explain operationally why the example satisfies the goal concept and to generalize the example as much as possible Problems with this: - Requires strong domain theory - Hard to do with incomplete, vague concepts
Macro-Operators
- Macro-Operators are sequences of operator actions that get you from one state to another - The goal of macro-operators and sub-goaling is to find solutions in a practical matter, not optimally; it uses patterns to find solutions - Macro-Tables can exist when there's an ordering of state variables such that effect of an operator on a state variable depends only on earlier variables? 3 Strategies 1. Single Common Goal State - Find path to nearest macro state - Follow path from macro to goal - These are called "hubs" which are designed to increase efficiency 2. Arbitrary Initial/Goal States - a. Find path from initial state to a common goal - b. Find path from actual goal to a common goal (invertible operators) - c. Combine the two paths 3. Abstraction - Instead of using hubs, search an abstract space with abstract operators and abstract states - You can abstract the abstractions to get hierarchical levels of abstraction
Neural Networks
- New computational paradigm that attempts to understand how the brain works - Massively parallel, knowledge is distributed across units -- Each unit computes very simple things - These machines learn, you don't just write rules - They are robust, you can have noise and it will still work Other things to note - Only works if you have a lot of data - Works best when there are plenty of examples but hard to identify rules (e.g. pattern recognition in vision, cognitive processing, control in self-driving)
Serializable Sub-Goals
- Once they're satisfied, you don't need to violate it. That part is done - Not obvious to find in most cases
What are static evaluation functions?
- Once you're done searching up until a certain depth, you need to evaluate the state that you're in to see if it's favorable - For chess, you can add up the value of the pieces on the board -- Also must consider position, pawn structure, etc. -- But you're missing dynamics of the play
Boltzmann Machine
- Origins of deep learning are here - Hidden neuron units make the model more powerful - Learning rule is the difference between what the model produces and what is expected A restricted boltzmann machine can be interpreted as a belief network. There is a single layer of hidden units fully connected to the input. It has maximum likelihood learning. A deep belief net stacks multiple RBMs (each RBM is treated as a hidden layer input to the next)
Non-Serializable Sub-Goals
- Previously achieved sub-goals must be violated - It does not decrease the branching factor and in general the solution length increases - People use it though because it seems to get you closer to the goal
LSTM NN
- Recurrent NN - Learn when to input/output/zero things from memory state Gated Recurrent Unit NN is simpler version of this
Heuristics
- Simplified model of the problem (delete some constraints on operator applicability) - Used to estimate distance towards the goal - It must never overestimate the actual distance to the goal otherwise we may overlook potential solutions
What are 2 ways to make minimax method more efficient?
- Some nodes don't affect the tree; don't need to keep searching in these two cases 1. Current value of MIN node < current value of MAX ancestor 2. Current value of MAX node > current value of MIN ancestor - This is known as alpha-beta pruning; a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree
Artificial Neural Network
- Starting point is neuron - You add layers (single or multiple), recurrency (when you feed output back as input), and modularity (language module activates vision module) to build more complex networks - Neuron takes multiple inputs; pass it through activation layer to decide firing rate
IDA*
- This is depth first iterative deepening but the cutoff is defined by the heuristic function instead - Low storage requirement since it's DFS which is an improvement over A* - Solution is optimal - Time Complexity = O(b^(ad)) where a < 1 - Space Complexity = log(b^(ad)) where a < 1 - This is the best possible algorithm because space is linear and time is exponential but reduced due to the heuristic
Hill Climbing Search
- Type of heuristic search - Expand unvisited nodes with the best heuristic value Disadvantages - May lead you indefinitely deep since we only account for an estimate of how far we must go, not how far we've come - May lead you to local maxima rather than the goal
A* Search
- Type of heuristic search - Expansion of Best First Search - Cost of node is cost of how far we've come plus how far we've to go i.e. f(node) = g(node) = h(node) - Avoids scenario where heuristic leads you infinitely deep the deeper we go, the higher our g(node) cost becomes so eventually it will not be prioritized - If the heuristic is always zero, this just becomes breadth first search - If the heuristic is always exact, there's no need to search Conditions for Optimality - A* is only optimal if the heuristic never overestimates the cost -- If this is the case, the heuristic is known as admissible -- Also means A* is guaranteed to find the optimal solution
Best First Search
- Type of heuristic search - Maintain OPEN & CLOSED lists - Expand unvisited nodes with the best heuristic value
Problem Reduction Search
- Used in game search; reduce problem into set of easier problems, whose solutions can be found, and then combine them into a solution - Resulting graph is an AND/OR graph -- AND Node - all subproblems must be solved in order to solve the main problem -- OR Node - solution of any subproblem will solve the main problem - A Solution Graph is a subgraph of solved nodes which demonstrate that the start node is solved
Other Brute Force Algorithms
1. Backward Chaining - Search from goal towards initial state - Advantages: works better if backward branching factor < forward branching factor - Disadvantages: requires explicit goal state & invertible operators 2. Bidirectional Search - Search forward from initial state and backwards from goal state at same time - Solution is optimal - Time/Space complexity is O(b^(d/2)) since it meets halfway
The Two Types of Search Strategies & Examples
1. Brute Force / Blind - Searches all possible states in a fixed order - Only works for small spaces - Examples: BFS, DFS 2. Heuristic Search - Attempts to use knowledge about the problem space to choose more promising operators first - Examples: A*, IDA*
How do you learn from examples in ML & what are some problems with learning from examples?
1. Concept learning (learn definitions of concepts as logical features) - 2 ways to do this a. Neural nets / parameter learning -- This is considered "weak" statistical learning b. Structural decriptions -- This is "strong", knowledge-based and is based on an understanding of the actual object However it is important to note you are biased to learn from certain examples over others 2. Decision Trees - Each level branches based on a feature - Classify one feature at a time - Leaves are results - Simpler trees are better predictors - Select questions in the decision tree so that each question gains as much info as possible (most informative question is at the top) -- You do this by calculating the gain of each feature and that's the order you should ask questions in Problems: 1. Noisy examples create branches that shouldn't exist 2. Could be branches without examples; leads to guessing 3. Might not work on real world data
2 Types of Deep Neural Networks
1. Feedforward - multilayer perceptron, convolutional, autoencoder 2. Recurrent - LSTM, gated recurrent unit (used for speech, language, time-series data)
What are 2 theoretical problems with gradient descent?
1. It will find local minima rather than absolute minima. However in practice this doesn't happen 2. Oscillations
What do we do in the context of real-time game playing, what problems can we run into as a result, and what are potential solutions?
1. Iterative Deepening - Limited time means limited search depth so we use iterative deepening - However this runs into the horizon problem i.e. you may finish before the horizon (e.g. while pieces are being traded); this will mislead you Two solutions a. Quiescence Search - search until you have a stable value b. Secondary Search - if you're about to make a move, check crucial values for that move just in case 2. Selective Search - Monte-Carlo Tree Search -- Search deep in a few promising directions rather than covering the whole space (by using a heuristic function) -- Works well e.g. in the game Go
What are the 5 approaches to knowledge acquisition in machine learning?
1. Learning by problem solving 2. Learning from macros 3. Learning from examples 4. Learning by explanation 5. Learning by discovery
What are the five ways to construct a plan under the modal truth criterion?
1. Step addition - to establish P 2. Promotion - we move our desired state S to before the clobbering happens 3. Declobbering - add steps that reassert whatever was previously violated 4. Simple establishment - assign values to prevent clobbering 5. Separation - prevent certain assignments
What are 5 tips for improving deep learning?
1. Stochastic gradient descent - faster way to do gradient descent, only looks at a sample subset of data when doing backpropagation instead of all the data 2. Linear rectified activation units - it is a piecewise linear activation function as an alternative to the sigmoid function and is less likely to get paralyzed 3. Dropout - randomly drop out some of the examples so they aren't a part of the network; network learns to anticipate more general representations that are more robust 4. Residual connections - can skip part of the network if it is doing poorly 5. Max Pooling - gradually form higher level representations; take a part of the network and for each square find the largest value and use that instead of all of the units together
Problem Space
A PROBLEM SPACE is used to formalize a problem and has 4 components: 1. Data structure that represents the state 2. The initial state 3. The operators which transform the state 4. The terminal/goal state
Backpropagation
A common method of training a neural net in which the initial system output is compared to the desired output, and the system is adjusted until the difference between the two is minimized This is gradient descent ("Gradient descent iteratively adjusts parameters, gradually finding the best combination of weights and bias to minimize loss.")
Temporal Difference Learning
A form of learning that modulates behavior according to the difference between an obtained reward and an estimate, compiled over the recent past, of an expected reward.
How to solve a solution graph in problem reduction search?
AO* - Extension of A* -- In A*, the open list is a subset of possible solution paths -- In AO*, the open list is a subset of possible solution trees -- Successors generated by choosing among OR nodes -- The open list is stored as one single graph - Hard to use AO* because difficult to know if your heuristic is admissible Also it is difficult to search the entire tree of future moves due to limited time/computation so instead what we do is we explore the search tree up to a certain number of moves (depth), gather the information and use that to inform our current decision. One method of doing so is the minimax method.
Constraint Satisfaction
Constraint Satisfaction Graph (CSP) - You have a set of n variables, each of which takes on a set of possible values - You also have constraints, where if one variable has a certain value, other variables have their values limited (e.g. 8 queens problem) - A binary CSP shows you which variables limit others - Can be used to solve 3-coloring problems, vision problems, etc. Constraint Search Graph - State = partial labeling - Operator = labeling a node (consistently with respect to other variables) - Goal = complete labeling - Each level labels a new variable - Nodes at depth n are goals Solved by: 1. Preprocessing - reduce problem space as much as possible a. Arc Consistency - remove label if inconsistent with all labels of any adjacent node b. Path Consistency - record pairs of labels inconsistent with 3rd node 2. Search - DFS + back-tracking -- Why DFS? Because the solution can only be at certain depth bound which is known to us, and not before that (e.g. 8 queen solution can only be found at depth d = 8) 3. Optimization Tricks a. Look-Ahead -- Variable ordering - we know we need to assign x variables so put at the top of the tree the variable with the largest number of constraints on future values/variables to limit search space --. Value Ordering - play safe; leave most options open to avoid backtracking b. Look-Back - Go to source of failure, find out why you failed, and then back-jump as far back as needed to avoid that failure (also known as dependency-directed backtracking) - Constraint recording - remember the failures in a particular part of the tree and avoid it in the future on a different part of the tree (aka learning)
Deep Backpropagation
Convolutional Layer - local connectivity from input Max pooling layer - winner-takes-all in a region Fully connected layers - classification Preprocessing - image adjustment, etc.
How does non-linear planning handle interactions between sub-goals?
It avoids ordering the operators until necessary For example NOAH - Creates a network of procedures and expands them to achieve the goal - Lots of scripts examine your current planning state for interactions / problems & then try solving them -- 1. Ordering - resolve conflicts and promote certain goals -- 2. Remove redundant preconditions However this is difficult because you have to write the critics / scripts which check for interactions. If you have too many critics, you'll never get a solution. If you have too few critics, your solution won't even work.
What is planning and what are the 3 ways to plan?
It is the highest level of search problems, and the goal for AI agents is to plan rather than just react to problems 1. State-Space Search - Simplest approach - Could use A* or something - Only need to know the rules of the game, don't need that much knowledge - You always consider all operators though so you waste time; it's not /really/ planning 2. Plan-Space Search - This planning involves knowledge about the operators (the effects / preconditions, etc.) - It is more mean-end analysis such as serializable sub-goals or GPS 3. Knowledge-based - More knowledge required to work - Fewer steps applied - Produces suboptimal paths because so much knowledge is involved
What is learning theory & the concept of learnability?
Learning theory covers what can be learned (e.g. by examples) and under what conditions (e.g. is there noise) Learnability describes whether it is possible to generate an algorithm that can learn via examples. Something is learnable if the required number of examples is polynomial with respect to error tolerance, # of features, and size of the rule
What is one method of searching game trees?
Minimax method - Search forward to a fixed depth - Evaluate frontier nodes by propagating values back up to the tree - Something to do with max of children of player 1 and min of children of player2
Deep Learning
Multiple layers in network, each layer learns to detect a certain/specific level of feature Can have bottleneck layer with few neurons which forces representations to be meaningful/generalizable However multiple layers has the problem of vanishing gradients - if you backpropagate the error signal across many layers it only takes time before the information is lost
When does RL work well / not work well?
RL does not work well when the entire state cannot be seen at any given time i.e. the state is only partially observable (e.g. video games when people can go behind walls) RL works well when it can (e.g. chess)
What is the internal model & critics explanation of RL?
RL splits learning problem into two supervised problems 1. Decision-making - make a learning agent who makes decisions 2. Modeling - make a learning agent who predicts what kind of reinforcement you will get if you perform certain actions (since this is not supervised learning) A critic is someone who smooths out the reinforcement signal which solves the credit assignment problem
Search Terminology
Root Node - initial state Leaves - how deep we have come so far Branching Factor (b) - how many operators we can apply to each state on average; tells us how widely expanding the search space is Solution Depth (d) - where the shallowest solution is; need to take at least d steps to reach the goal state Expanding Nodes - when you take a node and generate all of its children to explore it Generating Nodes - when you take a node and generate only one of its children Uniform Edge Cost - assumption that each edge cost is the same
What is STRIPS and ABSTRIPS?
STRIPS is a planner that uses goal-stacking; it solves high level goals by recursing on the preconditions. There are heuristics for choosing potential actions. The disadvantage is we spend a lot of time on unimportant details rather than bigger problems ABSTRIPS is an extension of strips that plans at several hierarchical levels. It assigns a criticality number to each goal to take into account how to prioritize goals. With this hierarcy, we use iterative deepening on the criticality number (looking at most important first). A problem with this is it can't handle interacting subproblems e.g. painting a ladder before the ceiling
Why planning over search & what are 2 problems of planning?
Some problems have too big of a space to be searched; instead we need problem decomposition (solving subproblems via planning) The biggest problem with planning is handling interactions i.e. what if you unsolve a previously solved goal Also there is the frame problem - some things could change in addition to those that we are explicitly considering e.g. moving a table probably moves the things on it too
How to make an admissible, non-consistent heuristic consistent?
Take the value that is the highest so far and if your heuristic gives you a lower value, don't accept it. Take the higher value from before
What is the Modal Truth Criterion?
There is a formal solution to planning known as TWEAK - When a plan is incomplete, you have a set of constraints that all possible conclusions must satisfy - Planning is repeatedly asking what additional constraints are necessary for the desired final constraints to be satisfied in all completions - We know if the conditions are met in the current state via the Modal Truth Criterion Modal Truth Criterion - Essentially a way to formalize the process of planning - For a proposition P to be true in some state, there must be a state that precedes the current state that establishes - For every step C (clobberer) that precedes S and denies P there is a white knight W between C and S that asserts R which unifies P - For a given incomplete plan, pick state S and proposition P, check all triplets T, C, W which will be done in O(n^3) time and that makes it possible to prove the plan correct. Now you can do a systematic search to find the plan. However this is largely inefficient since it is a search and also results in inefficient plans
Heuristic Algorithms
Three aspects of a heuristic algorithm: 1. Heuristic must be admissible - guaranteed to never overestimate 2. Delayed termination - once goal is found we don't stop. Only stop when goal node is popped off the open list 3. Pointer revision - a heuristic will never overestimate the distance to the goal node but it may overestimate the distance to an intermediate node so we need to be able to revise pointers when we find shorter paths to intermediate nodes Two properties of a heuristic algorithm (not always true but usually are): 1. Monotonicity - cost function never decreases as you get farther along the path i.e. we always underestimate the cost to a node 2. Consistency - if we have a path from N to the goal it will always be less than any other path from N to N' to the goal - These properties are considered equivalent
What neural networks are used for vision & language?
Vision - CNN Language - RNN
How do you solve the credit assignment problem?
You use critics to smooth out the reinforcement signal and determine which actions are good/bad
What is a problem with planning?
You want to be reactive rather than endlessly plan sometimes (e.g. self driving) Also planning doesn't consider optimizing resources while achieving the goal