Artificial Intelligence

Ace your homework & exams now with Quizwiz!

What is the ideal rational agent?

for each possible percept sequence (input), does whatever action is expected to maximize its performance measure, using evidence provided by the percept sequence and any built-in knowledge

What does a back-propagation learning algorithm do?

divide error "locally" among contributing weights and update layer-by-layer backwards

How does one determine how deep and wide to make a neural network?

use cross-validation

What is a problem-solving agent?

a kind of goal-based agent that decides what to do by finding sequences of actions that lead to desirable states

What is spatial pooling?

"Pooling" (e.g., taking max of) filter responses within local area provides robustness to slight shifting of image features (e.g., if images or content slightly misaligned)

What is the Gradient Descent formula? (VERY IMPORTANT)

*p_new = p_old - α x ∇pE* - p_new: updated parameter value - p_old: previous parameter value - α: step size ("learning rate") - ∇pE: Gradient/derivative of error function wrt. p

Summarize the argument against Weak AI from "Informality of Behavior." What's the caveat now?

- Argument is characterized as • Human behavior cannot be boiled down into simple rules • All that computers do is follow rules • Therefore, computers cannot behave as intelligent as humans - However, this critique is more about older AI techniques than newer ones, like machine learning and probabilistic models, which allow for more than just simple logic, and may allow for behavior closer to human capabilities

What are 3 main performance issues w/Gradient descent algorithms?

- Can be misled by local minima _ • Get stuck at smaller valleys - Can get stuck on a flat plain _ • Flat region in state-space function - Can overshoot back and forth _ • Oscillate from side to side _ • Learning rate too large

What is the "Bold driver" approach to choosing the learning rate α?

- Check value of error/loss function using estimated parameters at end of each iteration - If error was reduced, try increasing α by 5% (go further) for next iteration. - If instead error was increased (skipped the optimal location), reset parameters to previous iteration and decrease α by 50%.

Describe the k-Nearest Neighbor (k-NN) algorithm

- Compute distance from test sample to labeled samples in the training set - Assign test sample the label most common across the first "k nearest neighbors" from the training data • k typically small and odd numbered (no ties)

What are the main linear and nonlinear methods of model parameter estimation?

- Linear: least squares - Nonlinear: gradient descent

What is the "Least squares" approximation of minimizing error?

- Look for the best x in (Ax-b) that makes total squared error E as small as possible: - E = Sum of the squared-error (SSE) - E = Sum to i [(m*xi + b - yi)^2] ... (substitute in data for these then take partial derivatives)

Accg. to Minsky, what must we show to explain the mind?

- Must show how minds are built from mindless stuff - Using parts simpler than what we would consider "smart"

What is "mini-batch" gradient descent?

- Perform an update using mini-batch of N data examples (e.g., 50-256 examples instead of just 1) - Can have more stable/smoother convergence, more efficient implementations

What does an autonomous creature do?

- able to maintain long-term dynamic w/ enviro.t w/o intervention - once switched on, it does what is in its nature to do

Describe the traditional, GOFAI approach?

- abstract out enviro.ts and possible unexpected parameters to identify the essence of something - study that, expecting to generalize back to full concept later

What is emergence?

- intelligence of the system emerges from the system's interactions w/the world and from sometimes indirect interactions b/t its components -- it is sometimes hard to point to one event/place w/in the system and say that is why some external action was manifested

What is a robot?

- physical agents, artificially constructed, that perform tasks by manipulating the physical world - should be autonomous

What are some general uses for robots?

- replacements/supplements for human labor - enhancements for human capabilities - going where ppl can't go

What essentially is the simple "threshold" activation fcn for perceptrons? What then does it represent?

- threshold activation is essentially an equation for a line - most similar to tanh bc has same range - Threshold perceptrons represent functions that are linearly separable; very simple classifier

How does one tune the algorithm parameters?

-Use "Validation" data • Train classifier on a subset of the training data • Examine the classifier on the remaining training data (called the "validation set") • Tune the classifier to minimize the error on the validation set

What are the 3 laws of robotics?

1. "A robot may not injure a human being or through inaction allow a human being to come to harm" 2. "A robot must obey orders given to it by human beings, except where such orders would conflict with the First Law" 3. "A robot must protect its own existence, as long as such protection does not violate the First or Second Laws"

What are 4 key points of the embodied intelligence approach to AI?

1. Intelligent systems operate in a world that is detailed and not fully perceivable 2. It carries out tasks involving perception and motion (ie. agent) -- must survive in its own world, not solve prob.s built in by rsrchrs 3. Artificial creature is embodied -- is a physical agent that interacts w/ and is affected by the world 4. It must have internal drive to direct operations in the world -- motivation for actions

Name and describe the two different types of sensors that robots use

1. Passive sensors - capture signals generated by elements of the enviro.t (eg. cameras, microphones) 2. Active sensors - send energy into the enviro.t and observe/receive the result (eg. sonar, radar) - provide more info, but use more energy

What are the 2 classic activation functions? what are there shapes and ranges?

1. Sigmoid: Sig(x) = 1 / (1 + e^-x) - (0,1) - like a smooth step threshold fcn 2. TANH: tanh(x) = 2sig(2x) - 1 - (-1,1) - steeper sigmoid x = weighted input

What are the two components of node computation?

1. linear - sum up all the inputs to the node 2. nonlinear - activation function, rough threshold

What is the earliest conception of AI?

400 BC, Aristotle considered ideas that the mind is in some ways like a machine

What does "Unsupervised Learning" mean?

Do not have the known (ground-truth) clusters (or criterion) to evaluate against

What is Spatial Pyramid Pooling?

Handles varying sized images by pooling into a spatial hierarchy (e.g., 4x4, 2x2, 1x1), rasterizing (e.g., 16x1, 4x1, 1x1), then concatenating into a single vector (e.g., 21x1)

What is the product rule for conditional probabilities?

P(A ∧ B) = P(A | B)P(B) thus P(A | B) = P(A ∧ B) / P(B)

What is Bayes' Rule?

P(h | D) = P(D | h)P(h) / P(D) where P(D) = P(D | h)P(h) + P(D | ~h)P(~h)

What's the general approach/process to evolutionary algorithms?

Start > Initial or next population input > Assess fitness w/performance test >> Finish if good enough > Select out fit individuals (probability injected) > Reproduce fit individuals (probability injected) > Next population input

What is an agent? As in, a mental agent (Minsky's definition)

by itself can only do some simple thing that needs no mind or thought at all

What is the reductionist approach to AI?

shift from "solving problems" to "existing w/in a world and maintaining of goals"

What is Deep Learning? What's the conventionally held threshold to be considered deep learning?

• "Algorithms that attempt to model high-level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations" - Exploits hierarchical explanatory factors where higher level, more abstract concepts are learned from lower level one • Deep neural network (DNN), with many hidden layers - G. Hinton: >1 hidden layer

What is the notation of the prior, likelihood, and posterior probabilities?

• *Prior probabilities* - Prior of hypothesis h: P(h) - Prior of data D: P(D) • Initial probability that data D will be observed (given no knowledge about which hypothesis holds) • *Likelihood probability* - Probability of observing data D given that hypothesis h holds: P(D | h) • *Posterior probability* - Probability that h holds given data D: P(h | D)

What is the difference b/t replication and reproduction (EA)?

• *Replication* - Chromosome merely reproduced unaltered, pass onward/through • *Reproduction* can be affected by application of genetic operators - Applied to pair of chromosomes selected for reproduction (eg. cross-over and mutation, which correspond to idealized versions of the biological operators)

What is ResNet?

• A ultra-deep 152-layer CNN • Build on "residual blocks" - Skip connections • It's much easier to learn to push F(x) to 0 and leave the output as x, than to learn an identity transformation from scratch. • State-of-the-art model, many new CNNs are based on this approach

W/in the context of evaluation, how does one determine the accuracy, precision, recall, and Fβ-Measure of classifiers? How should precision and recall be considered?

• Accuracy = (# of correct classifications) / (# of classifications) • Precision = (# of correctly detected events) / (# of detected events) = (TP/(TP+FP)) • Recall = (# of correctly detected events) / (true # of events) = (TP/(TP+FN)) • Fβ - Measure = (1 + β^2)x[(PrecisionxRecall)/(β^2*Precision + Recall), where β = 1 -> Harmonic mean b/t precision and recall • Precision and recall must be considered 2gthr, balance classifier and ground truth against e.o.

Define Learning w/in the context of machine learning

• Agent improves its performance by adjusting its own models - Goal of discovering "relationships" (patterns) between input and output • What features are best in mapping input to output? • Need to recognize what's important and what is not - Discover properties of the environment

What is the formula for calculating depth from stereo?

• Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). • Z = f * (T/(xl - xr)) - where f = focal length T = camera baseline xl - xr = disparity

What is the best hypothesis accg to Bayes theorem, and what does it use to calculate the probability of a hypothesis h?

• Best hypothesis: - Most probable hypothesis, given data D and any initial knowledge (prior probabilities of various hypotheses) • Bayes theorem provides direct method for calculating probability of a hypothesis h using - Prior probability of h - Probability of observing data D given the hypothesis h (likelihood probability) - Probability of the data D itself

In CNNs, how can one turn local into "convolution"?

• Can think of this as each location in the "Feature Map" as a node in a neural network layer, with 3x3=9 weights coming into each node - but here all of these nodes have the same "shared" weights (i.e., the feature map has a total of 9 weights associated with it to learn)

What is the general framework for creating decision trees? Give a general overview of the questions it asks in its process

• Classification and Regression Trees (CART) - How many splits? - How determine the split? - When to stop? - How to assign categories to leaf nodes?

What is supervised learning under the context of ML? What is given, and what is the objective?

• Classifier eg. tell which objects belong to category, but not why (leave AI to figure this out) • *Given*: set of training data with corresponding class labels representative of entire dataset • *Objective*: build a classifier to predict output labels (classes) of data in unseen test set - Need to infer a function that separates the data into desirable classes - May need to tune algorithm parameters - Feature representation is important

How is the split determined w/decision trees or CART?

• Prefer decisions that lead to simplest tree (Occam's Razor) - Want property to split data into "purest" groups possible -- use impurity measure • Choose decision at node N that decreases impurity the most

What is Features from Accelerated Segment Test (FAST)? Describe the steps

• Compare local neighborhood around each pixel to determine if pixel is a good feature point • For each pixel x - Look at the pixels on the border of a circle with radius r around x - Let n be the number of contiguous pixels whose intensities are either 1) all > I(x) + T OR 2) all < I(x) - T • I(x) is the intensity at pixel x and T is a threshold - If n ≥ n* then the pixel is considered a feature - Results suggest using r = 3 (yields 16 pixels on border) and n* = 9 provides best detector

What is naive connection in CNNs? What's the issue w/this?

• Connect every pixel to each node in the first hidden layer - Too many weights! - Not have enough training examples!

How can "if-then" rules be written as bit strings?

• Consider constraint on mutually exclusive values of a single attribute 'Outlook' - Possible values: Sunny, Overcast, or Rain • Represent Outlook as "or" bit string of length 3 - Each bit position corresponds to a possible (unique) value > 001 : Outlook = Sunny > 010 : Outlook = Overcast > 100 : Outlook = Rain > 110 : Outlook = Rain ∨ Overcast ("or") > 111 : Outlook = Sunny ∨ Rain ∨ Overcast

What are atrous convolutions?

• Convolution with upsampled filters ("Atrous convolution") • Method to effectively enlarge the field-of-view of filters (larger context) without increasing the number of parameters or the amount of computation

What is "Batch" Gradient Descent? W/in this, what is the difference b/t iteration based and convergence based BGD?

• Define error/loss function of parameters (p) & all data (x): E(p, x) - Compute derivative/gradient of E(p, x) for each p_i: ▽_piE(p, x) • *Iteration based*: - for fixed number of iterations - >> for each parameter pi ∈ p - >>>> compute gradient using all data ▽_piE(p, x) - >>>> update pi with pi = pi - α*▽_piE(p, x) - do simultaneous update of all parameters p_i at end of each iteration • *Convergence based*: - while loss function decreasing (or reaches tolerance value) - >> for each parameter pi ∈ p - >>>> compute gradient using all data ▽piE(p, x) - >>>> update pi with pi = pi - α*▽piE(p, x) - >> Evaluate loss function E(p, x) with new p

How does mutation work in evolutionary algorithms?

• Each gene can be altered by random mutation to a different value - A bit in a single chromosome is flipped (e.g., 0 → 1) • Typically use a small probability rate of mutation

How was MYCIN evaluated?

• Evaluated using a form of the Turing Test - 10 randomly selected case histories of meningitis - Re-diagnosed by MYCIN and 8 practitioners at Stanford Med School of varying levels of expertise - evaluated by 8 infectious-disease experts away from Stanford -- blind review - RESULTS: MYCIN performed at least as well as Stanford experts -- pry bc KB represents combined expertise of best medical minds

Within least squares, how do you minimize the error?

• Find a set of parameters (m, b) that "best" fit the data - Ax = b --> Ax-b = 0 - For a specific (m, b) and imperfect "line" data, Ax-b = ±e - Goal is to make the error ( e=|Ax-b| or e=(Ax-b)^2) as small as possible (aka. "Loss Function") - Most minimized using calc (2 derivatives for 2 (m,b) unknowns)

How are individuals represented in evolutionary algorithms?

• Fundamental representation of an individual is called a chromosome • Typically, chromosome represented with a string over a finite alphabet - Bit string 0111010101110000110101 • Each element of string is called a gene

How does stereo help to extract 3D information?

• Stereo: - Shape from difference between two views - Infer 3-D shape of scene from two (multiple) images from different viewpoints • Need information on camera pose (calibration) and image point correspondences

What is K-Means clustering?

• Given initial set of K centroids/means (generally obtained through initialization with random data points or locations): - Assign each point to closest (generally Euclidean) centroid - Re-compute centroid/mean locations based on current assignments - Repeat until convergence or maximum number of iterations

What is the definition/goal of computer vision? What does this require then?

• Goal of computer vision is to make useful decisions about real physical objects and scenes based on sensed images - Process of discovering from images what is present in the world, where it is, and what it is doing • Construction of scene descriptions from images • Require representations of shape, motion, color, context, etc.

What are the benefits of locally connected CNNs?

• Good when input image is registered - Certain visual features/patterns are in same area of the image • e.g., aligned "mug shot" for face recognition (nose, mouth, eyes are in same image locations)

What is clustering? What outcome should it achieve? Finally, what kind of learning is it referred to as, and what does this mean?

• Grouping together items that are similar (similarity determined via given measurement of "closeness" or proximity) • Items from same cluster should be more similar to each other than to items from different clusters (makes selecting appropriate data representation and proximity measurement imperative) • Referred to as *"Unsupervised Learning"* - Do not have the known (ground-truth) clusters to evaluate against

What is L2 Regularization? What does it do?

• Helps to reduce overfitting • Preference to learn smaller weights wi - Though larger weights can be found if they noticeably improve the E_0 • Also spreads weights as not to have only a few weights with large values and the rest with zero - Keeps the network from focusing on specific non-generalized items

What is SoftMax output?

• If target output is a vector of dependent multi-class membership, it can be desirable to train the network to output a "probability distribution" across the classes - Multinomial "hard targeting" [1 0 0], [0 1 0], [0 0 1] • Also called "1-hot" vectors, used as ground-truth targets - Network outputs: [.99 0 .01], [.15 .85 0], [.2 .1 .7] • Use normalized exponential of outputs out_oi - Sums to 1 (probability) - "Peaky" distribution

What is the Maximum A Posteriori (MAP) hypothesis? What does it maximize? How does one determine it and to what end?

• In many learning scenarios, *seek most probable hypothesis h ∈ H, given the observed data D* • Maximize P(D | h)P(h) (posterior) • Determine MAP hypotheses using Bayes theorem to calculate posterior probability of each candidate hypothesis

What is a Maximum Likelihood (ML) hypothesis?

• In some cases, assume all h ∈ H have equal priors: P(hi) = P(hj) • Thus can simplify hMAP by only considering the term P(D | h) (likelihood) to find the most probable hypothesis • Any hypothesis that maximizes P(D | h) is called a maximum likelihood (ML) hypothesis

What is the perceptron learning algorithm? (2 main steps)

• Initially assign random weights - guess of line equation parameters • Update network to try to make consistent with examples - make small adjustments in weights to reduce diff b/t observed and predicted values - updating process divided into "epochs" (updating all weights for all examples)

What is the input and output of decision trees? How do they learn and classify? What are the advantages of this method?

• Input: Features per example • Output: Classification label • Learns by subdividing the data into clusters with same properties • Classify pattern through sequence of questions • Good at determining which features are good discriminators; easy to interpret

How does K-Medoids differ from K-Means? What is it better at?

• Instead of using mean to represent a cluster, use a particular example in the cluster • Method: - Select initial medoids (random examples) - Repeat until convergence or max # of iterations: • (Re)assign each point to the cluster having the closest medoid • In each cluster, make the example that minimizes the sum of distances within the cluster the medoid • More robust to noise and outliers (as compared to K-Means)

How does Minksy define intelligence? What's the caveat?

• Intelligence = the ability to "solve hard problems", "quickly and efficiently" • "intelligence" is our name for whichever of those processes we don't yet understand • BUT meaning of intelligence changes as we learn more about psych,biology

What is Brooks' alternate view of intelligence and prob-solving?

• Intelligence is all about making judgments when there are large #s of messy details all around - details are integral to solving prob.s

What is an evolutionary algorithm? What's it used for? What's important about it? What's the general process?

• Learning method based loosely on simulated evolution • Used to find best "individual" (model, code) that optimizes a numerical measure for the problem • Important: Can learn both the model and its parameters simultaneously (very attractive for difficult prob.s) • Start with set of individuals (initial population) - Apply selection: choose the best individuals - "Evolve" those individuals that are most successful

What is the main idea of the gradient descent algorithm? 4 main steps

• Main idea: 1) Evaluate error/loss function for current guess of parameters 2) Determine change in parameters that decrease error (from gradient of error function) 3) Update parameters in this new change 4) Repeat process until converge (or hit max iterations)

What are 3 methods of weight initialization for neural networks?

• Method 1 - Sample from Gaussian with σ^2=1, then scale • Scale with a small value (e.g., .001) • Method 2: (Xavier initialization) - For the set of weights that go into a node (N_in of them), sample from Gaussian with σ^2 = 1/N_in • The value σ^2 = 2/ N_in was derived ReLU activations • Method 3: - Use the average number IN and OUT to give σ^2 = 2/(N_in+N_out)

How does one determine when to stop growing decision trees in ML?

• Methods exist to determine when to stop splitting - but may declare a node a terminal too early • Alternative: grow tree out entirely (each leaf perfectly pure) and then prune • Pruning: - Work bottom-up - Compute the increase in impurity if two child nodes linked to common parent node are eliminated - Merge if increase in impurity is negligible

How does cross-over work in evolutionary algorithms?

• Mixing (mating) two chromosomes - Controlled by probability rate • A split point (cross-over point) is chosen randomly along length of chromosome • First part of chromosome A is spliced to last part of chromosome B (and vice versa) - Yields two new chromosomes • For example, if cross-over point is 10 - One offspring will get genes 1-10 from one parent and genes 11-END from another parent

What does Naive Bayes assume? How well does this perform?

• Naïve Bayes classifier assumes data items are conditionally independent given the hypothesis - Probability of observing conjunction d1,d2,...,dn is just the product of the probabilities for individual data items - hMAP = argmax P(d1,d2 ,...,dn | h)P(h) • Not true for many/most domains, but can work well in practice (e.g., spam filter)

With no "best" clustering, what do you need to specify? Define the internal and external criteria w/in this context

• Need to specify objective function/evaluation criteria to compare clusterings -*Internal Criteria*: quantify quality of clustering based on data itself -*External Criteria*: quantify quality of clustering based on ground-truth labels (but usually do not have these)

Describe neural networks as comprehensively as possible (at a relatively broad level) -- composed of what, connected by what, etc.

• Neural net is composed of nodes (units) - Some connected to outside world as input or output units • Nodes are connected by (input and output) links • Each link has numeric weight associated with it - Primary means of long-term storage/memory - Weights are modified to bring network's input/output behavior to goal response • Nodes have activation function inside - Given its inputs and weights - Local computation based on inputs from neighbors (no global control)

Why type of phenomena is emergence? What does this allow for?

• Not linear -- behavior produced by the system is more than the sum of its parts (Gestault) • Do not n.ly need to build an explicit behavior into the system itself

What are 3 features of Bayesian reasoning?

• Observed data can be combined with prior knowledge to determine final probability • Each training example incrementally decreases/increases estimated probability of a hypothesis • New instances can be classified by combining predictions of multiple hypotheses (weighted by their probabilities)

What is "Stochastic" Gradient Descent? Go through the process. How does it perform?

• Perform a parameter update for each training data example at a time (instead of all data together) - Usually much faster, can learn online > while not converged >> shuffle/randomize data x >> for each parameter pi ∈ p >>>> for each training data example xj >>>>>> compute gradient (using only xj) ▽piE(p, xj) >>>>>> update pi with pi = pi - α*▽piE(p, xj) - do simultaneous update of all parameters pi at end of each iteration! • Performance fluctuates - Can jump to better local minima - However can keep overshooting (better if slowly decrease learning rate)

What does the pinhole camera model result from? What is the problem w/this single view model tho?

• Pinhole camera model results from reducing aperture diameter to infinitesimally small point - Simple model and all objects in focus • Structure and depth are inherent ambiguous from single view

What is ReLU? How does it work? What are its Pros?

• ReLU(x) = max(0,x) • if you're above the threshold keep going, if you're not you're done • Pros: - Non-saturating - Faster convergence - Efficient computation - Resulting representation is "sparse" (not all having nodes have non-zero activations), removing redundancies

What is momentum (gradient descent)?

• Remember the update at each iteration, and determine next update as combination of current gradient and previous update: - ∆pi = (1)α∗▽piE(p, xj) + (2)β* ∆p where (1) current, (2) remember last time - update p_i with p_i = p_i - ∆p • Result: Tends to keep traveling in same direction, preventing oscillations

What is selection a measure of w/in evolutionary algorithms?

• Selection as a measure of fitness - Need fitness function to evaluate how successful - Numeric value of "how good" the individual solves the problem

What does the selection process specify? W/in this, what is Fitness-proportional selection?

• Selection process specifies which chromosomes from one generation will be sources for chromosomes in the next generation • Chromosomes are ranked and selected in order of their fitness (benefit of pushing population toward higher and higher fitness scores) • *Fitness-proportional selection* - Probability of being selected is proportional to fitness; sometimes low-fitness chromosomes are selected to preserve some diversity

Describe the process of m-Fold Cross-Validation for tuning classifiers

• Set classifier options (number of parameters, model form, training time, input features, etc.) • Estimate generalized classifier performance - Randomly divide training set into m disjoint sets of equal size - Train using (m-1) subsets and validate on the remaining subset - Repeat m times, using different validation set each time, then average the results • Repeat entire process for different classifier options and choose the options which maximize the average results

How does genetic programming differ from genetic algorithms? Name and define it's 4 principal operators

• Shares same algorithmic structure as GA, but chromosomes consist of "computer programs" (not bit strings) (ie. Mathematical operators and variables) • Four principal operators: 1. Replication - Code snippet reproduces unchanged 2. Cross-over - Mating of two code snippets 3. Mutation - Syntax change (compatible with total snippet) • Replace number with another number • Change math operator with another math operator 4. Insertion - Replace single element in snippet with another (short) snippet (randomly chosen from a set)

What issue is there with the sigmoid activation function?

• Sigmoid can have "saturation" problems when input is composed of high (+/-) weights - output = 1/(1+e^-input), where input = x1*w1 - If |w1| is large, then output is near 0 or 1 --> dout/dnet = out x (1 - out) - Thus the gradient will be near zero... Bad for learning with Gradient Descent (slows down or makes no changes)

What is Dropout?

• Simple way to prevent overfitting to training data - Samples a network within the full network, and only updates the parameters of the sampled network based on the input data • Keeps a neuron active with some probability (p=.5), otherwise set it to zero • For each training example a different set of units to drop is chosen • No dropout at test time, only during training - Approx. avg of dropped-out networks by using the full network with each node's output weighted by p - Interpretation: evaluate an averaged prediction across ensemble of sub-networks

How are categories assigned to leaf nodes?

• Simplest approach is to take majority vote of class labels at leaf node - Ideally there will be one dominant class • Potential options when tie occurs: - Random assignment - Take into account priors - Take into account classification risks (cost of misdetections or false alarms of categories)

What are perceptrons?

• Single-layer, feed-forward network • Bunch of inputs, one output • Simplest neural network -- where it all started

What time-scales should good theories of mind span?

• Slow - for the bil. years of brain evolution/adapt.n • Fast - for the fleeting wks/mths of infancy/child • In-b/t - for the centuries of growth of our ideas thru history

What are 6 six main steps in the supervised learning process?

• Split data into training and testing sets • Determine features to employ • Select a classifier • Train the classifier using the training set • Classify the test set • Evaluate the results

How does Minksy characterize "Common sense" knowledge?

• Term conceals almost countless different skills - multitudes of life-learned rules and exceptions, dispositions and tendencies, balances and checks • Obvious and natural, layered over time - layers becoming increasingly remote • Seems so simple due to amnesia of infancy -- learn about such things as children but rarely think about them years l8r

What are Inception Style Networks?

• The Inception network is all about going wider • Run multiple filters over the same input map in parallel, then concatenating their results into a single output • The next layer of the model gets to decide if (and how) to use each piece of information

How do traditional AI and embodied AI differ in terms of task complexity and environment complexity?

• Traditional AI: - keeps task difficulty, then tries to make enviro.t more complex - need to make enviro.t more complex to reach target • Embodied AI: - starts w/most complex world ever to be encountered, then takes up challenge of task to perform in that enviro.t - must take up more complex tasks to reach target

With what goal should training data be selected for, and for what reasons is training on unseen data crucial?

• Training data must be selected so as to reflect the global data pool • Testing on unseen data is crucial to prevent overfitting to the training data - Unintended correlations between input and output - Correlations specific to the set of training data

What are Convolutional Neural Networks (CNNs)? What are they composed of? What can they be trained w/?

• Type of neural network designed to take advantage of the 2D structure of "image" data • Composed of one or more "convolutional" (with activations) and "pooling" layers, and may have fully-connected layers (as in typical artificial neural networks) at end for classification - Uses tied/shared weights at convolutional layers • Can be trained with standard backpropagation - Easier than with regular, deep, feed-forward neural networks (CNNs have many fewer parameters to estimate)

What are the practical difficulties of Bayesian Reasoning?

• Typically require initial knowledge of many probabilities - Often estimated based on background knowledge, prev.ly available data, assumptions about form of underlying distributions • Computational cost can be high

What is a "Fully-Convolutional" CNN (FCN)?

• Use convolutions (perhaps with pooling) all the way to the end - No fully-connected 1-D classification layer at end • With multiple feature maps at the end, learn to output a "vector of values" per spatial location - Semantic Segmentation (vector of class probabilities at each pixel) - *Maps down to small, then reproduces back up to big*

What is "Transfer Learning" for Classification? Go through the process of it

• Use trained model (on a specific task) • Push new dataset through, and take outputs before final classifier • Treat those outputs as input features for this new classification task • Train a new classifier with these features • Example: classify "scene type" vs. "semantic segmentation" • Both use image features

What is Gradient Descent useful for, and what does it determine?

• Useful for solving systems of non-linear equations - Non-linear in the model parameters (not data) • Method determines local minimum for a multi-parameter error function - Search through "parameter space" to find minimum error

When does K-Means clustering work well, and what are its results dependent on? Furthermore, what does one need to know a priori?

• Works well when clusters are compact and well-separated (but does not always perform well) • Results dependent on initial conditions - Often run multiple times and keep clustering minimizing sum of squared distances (points to centroids) • Need to know number of clusters K a priori

How does "Depth-Limited" Search search? What does it avoid from DFS and how? How does it implement the queue? What is it guaranteed to find? Finally, how does it do on completeness, optimality, time, and space?

• same as depth-first search, just w/prespecified depth limit of l - in romania eg., know there are 20 cities, so can make l = 20 • avoids pitfalls of DFS by imposing cutoff/stop • implementation: DF queue w/nodes at depth l having *no* successors • guaranteed to find sol'n (if exists), but not guaranteed to find shortest sol'n - complete (if l big enough), but not optimal • Same time and space complexity as DFS

For S1 and S2, when is S1 <=> S2 true?

• true IFF S1 => S2 is true AND S2 => S1 is true • Basically, both S1 and S2 have to be true OR both have to be false -- have to be same

What does an optimal solution minimize?

the path cost

What is Game Playing?

"Game playing is idealization of worlds in which hostile agents act so as to diminish one's well-being"

What is a proof? When is an inference procedure complete?

- Proof: record of inference procedure operations - Inference procedure is complete if can find proof for any sentence that is entailed

When was AI birthed as a discipline? In other word, when was the term "Artificial Intelligence" first conceived?

*1956* at Dartmouth workshop

What is the difference b/t autonomous and non-autonomous behavior, and which should a rational agent be?

*autonomous behavior* - behavior is determined by its own experience - eg. clock that detects and sets to atomic clock, or adjusts to diff.t time zones *non-autonomous behavior* - no use of percepts, uses only built-in knowledge - eg. standard clock - all of its assumptions must hold rational agent should be *autonomous*

What is a definition of the "acting humanly" category of AI? What is a common measure of this? What capabilities does this demand of computers?

- "art of creating machines that perform functions that require intelligence when performed by ppl" - turing test - computers need lang. proc.g, reasoning, learning, knowledge

What is a definition of the "thinking humanly" category of AI? What does it aim to do? What does it require?

- "automation of activities we associate w/human thinking, activities such as decision-making, problem solving, learning..." - get inside the actual workings of the human mind (psychology) - requires scientific theories of brain/mind

What are some definitions of the "thinking rationally" category of AI, and what field did it initiate?

- "study of the computations that make it possible to perceive, reason, and act" - "study of mental faculties thru use of computational models" - laws of thought, logic

What is the strong AI vs the weak AI view in regards to putting the human mind into a computer?

- *Strong AI*: build/upload a mind into a computer - *Weak AI*: not a mind, but good intelligent process; no uploading

What do algorithms do and not do before searching?

- DO define states and operators - DO NOT generate entire graph of states and operators -- would require too much time/space/memory, generate and explore only promising paths

What is pruning? What improves its effectiveness/reduces its time complexity?

- Eliminating a branch of search tree from consideration w/o looking at it - good node ordering improves effectiveness, reduces time complexity (ie. small to large)

What do quantifiers express? What are the 2 standard ones? For each, which connectives should be used with them?

- Express properties of entire collections of objects - "There exists", always use Λ - "For all", always use =>

What is Unit Resolution? (prop. logic)

- From disjunction, if one of the disjuncts is false, can infer the other is true - know A or B is true, know ~B is true, therefore know B is false, therefore A has to be true for A or B to be true

What is Double-Negation Elimination? (prop. logic)

- From doubly negated sentence, can infer a positive sentence - ~~B => B

What is Modus Ponens?

- From implication and premise of implication, can infer conclusion - known that A implies B, known that A is true, therefore also know that B is true

What is the "Verbal" data hypothesis?

- Info used during performance of task is reportable in verbal protocols - Info reported in verbal protocols is actually used in prob-solving - confounds exists

What is a performance measure?

- a way to evaluate an agent's success; embodies the criterion for success of an agent's behavior - specifies numerical value for any enviro.t history twd the goals

What is an agent function and agent program?

- agent function: specifies which action to take in response to any given percept sequence; maps percepts to actions - agent program: implements the agent function for an agent

What is the definition of an agent? (first, earlier one)

- an entity that *perceives* (its enviro.t through *sensors*) and *acts* (upon that enviro.t through *effectors*)

What are 2 problems with Hill Climbing? What is a solution to these?

- can be misled by local maxima (stuck at smaller peaks) - can get stuck on plateaus - solution: random-restart climbing

What is path cost?

- cost associated with each action along sequence of actions (or path)

What is a goal test? What are the two difft. kinds?

- determines whether a given state is a goal state 1. explicit: given list of states and just have to check it 2. implicit: a property, eg. "checkmate"

What is And-Elimination? (prop. logic)

- from conjunction, can infer any of the conjuncts - know whole conjunction statement is true, therefore know all individual conjuncts also true

What is And-Introduction? (prop. logic)

- from list of sentences, can infer their conjunction - know all separate statements are true, therefore can AND them all together into one true conjunction

What is Or-Introduction? (prop. logic)

- from sentence, can infer its disjunction with anything else - can OR a known true statement with everything else in the knowledge base bc only need >= 1 true statement for whole sentence to be true

What is offline problem solving?

- have complete knowledge of problem and solution beforehand - search space, states, and transitions are all known

Why is AI difficult?

- intelligence itself is not very well defined/understood - easy to recognize intelligent behavior when seen, but difficult to define it specifically enough to evaluate computer program as such

In FOL, what are Complex Sentences made from? Give an eg.

- made from atomic sentences using logical connectives - eg. Older(John,30) => ~Younger(John,30)

What does equality mean in FOL?

- make statements to the effect that 2 terms *refer to the same object* - eg. "Henry is the Father of John": Father(John) = Henry

What is Resolution? (prop. logic)

- most difficult bc B cannot be both true and false - One of the other disjuncts must be true in of the premises (implication is transitive) - know (A OR B) is true, know (~B OR C) is true, therefore know A <=> ~C, therefore A or C is true

What is a percept and a percept "sequence"?

- percept: perceptual inputs at ant given instant - percept "sequence": complete history of everything agent has perceived

What is Alpha-beta pruning?

- prunes away branches that cannot possibly influence final minimax decision - returns EXACT same move as general minimax

What is the "total" turing test?

- put things in room w/subject, then ask them what was put in - testing their perception, manipulation, and sensation capabilities

What does it mean to think or act rationally?

- think/do "the right/correct thing" - no mistakes

What does it mean to act rationally? What does rational behavior not necessarily involve?

- rational behavior = doing the right thing - right thing maximizes some goal given avail. info, correct inferences - doesn't n.ly involve thinking -- can be reflexive

In FOL, what are the 2 main kinds of rules?

1. *Diagnostic* rule - Lead from observed effects to hidden causes -- "infer cause from effect" 2. *Causal* "model-based" rule - hidden world properties causes certain percepts -- "infer effect from cause"

What are the five main steps that problem-solving agents go through?

1. *Formulate goal* - based on current situation 2. *Formulate problem* - decide what actions and states to consider given goal, then organize them (into graph structure when completely filled out); ask how to find best path to goal 3. *Search* - process of looking for best action sequence to reach goal 4. *Execution phase* - perform rec.d actions 5. *find new goal* (repeat as nec.)

What are the 6 binarily varying properties of environments? Define each

1. *Static vs dynamic* - whether or not enviro.t changes while agent is "thinking"/perceiving 2. *Fully vs partially observable* - if sensors give access to complete state of enviro.t or not 3. *Discrete vs continuous* - distinct, clearly defined percepts and actions or not 4. *Deterministic vs stochastic* - if next state of enviro.t is completely determined by current state and action executed by the agent (can't predict enviro.t in stochastic -- unpredictable enviro.t can affect next state) 5. *Single vs multi-agent* 6. *Episodic vs sequential* - episodic means next episode does not depend on previous episodes (completely atomic), sequential means next state depends at least partially on previous ones

In what two ways is knowledge representation defined by?

1. *Syntax*: defines the possible "well-formed configurations" of sentences in the lang. 2. *Semantics*: defines the "meaning" of sentences (requires interpretation) - defines the truth of a sentence in a world (or model)

What are the 2 main types of search?

1. *uninformed*: given no info about problem other than info available in problem def'n - no info on # of steps or path cost from "current state to goal state" 2. *informed*: given "some idea" of where to look for sol'ns; use prob-specific knowledge beyond def'n itself - has *guess* on how far away goal sits from each state

What 5 things does a simple Knowledge-Based Agent need to know?

1. Current state of world 2. How to infer unseen properties of world from percepts 3. How world evolves over time 4. What it wants to achieve 5. What its own actions o in various circumstances

Define a game as a search problem (4 components)

1. Initial State - board position, whose move it is 2. Operators - (successor fcn), defines legal moves and resulting states 3. Terminal (goal) test - determines when game is over 4. Utility (objective, payoff) fcn - gives numeric value for the game outcome at terminal states

What are the 2 types of added constraints? (knowledge systems)

1. Limited time to solve problems - provide insights into expert's search strategy - allows extraction of core, vital aspects of solving prob. 2. W/holding information - provides evidence about the extent to which history data are needed in the interpretation process - but more left out is less reliable as evidence

What is the PEAS description of agents?

1. Performance measure 2. Environment (it's in) 3. Actuators 4. Sensors

What four things does rationality depend on? What does this all lead to?

1. The *performance measure* 2. The *percept sequence* 3. What the agent *knows about the enviro.t* 4. The *actions* that the agent can perform - leads to the ideal rational agent

What are the four categories of AI?

1. Thinking humanly 2. Acting humanly 3. Thinking rationally 4. Acting rationally

What are the 3 types of cases? (knowledge systems)

1. Typical cases - use during prob. identification stage; training data 2. Random cases - random cases selected from case file - check for gaps in coverage; validation/evaluation data 3. Extreme and tough cases - don't know what these are - involves unusual circumstances - reveals hidden assumptions and exceptions from typical cases

What are 4 common ways of measuring performance?

1. completeness: does it find a sol'n when one exists? 2. optimality: is it the sol'n w/lowest path cost? 3. time complexity: how long does it take to find sol'n? 4. space complexity: how much memory needed to perform the search?

What are 3 knowledge acquisition problems?

1. human skill is practice-based - skills are highly integrated and often not explicitly/individually retrievable 2. Individual's expertise can also be influenced by social processes, hidden agendas, etc. 3. Expertise changes over time -- lifelong learning

What four items define a problem?

1. initial state 2. actions/operators available to the agent 3. goal test 4. path cost

What are 3 goals of acquiring/extracting expert knowledge?

1. learn about steps in solving prob.s 2. identify knowledge used for prob.s 3. gain insights about nature of mental processing

What three things generally characterize expert systems?

1. open to inspection - present intermediate steps and answers Q's about the sol'n process - NOT a black box 2. easily modified - w/adding/deleting from KB 3. heuristic - using often imperfect knowledge (employs "tricks", "rules-of-thumb")

Describe the 3 nested quantifier rules

1. ∀x ∀y is same as ∀y ∀x 2. ∃x ∃y is same as ∃y ∃x 3. ∃x ∀y is NOT same as ∀y ∃x

What are two important relations of quantifiers? What are the 3 relations involving connectives and nots? (maybe easier to write answer than think/speak through)

1. ∃x P(x) = ~∀x ~P(x) 2. ∀x P(x) = ~∃x ~P(x) 1. P(x) => Q(x) is same as ~P(x) V Q(x) 2. ~(P(x) Λ Q(x)) is same as ~P(x) V ~Q(x) 3. ~(P(x) V Q(x)) is same as ~P(x) Λ ~Q(x)

What is entailment?

one fact follows logically from another

What is the term for the general approach to informed search? What are nodes selected for expansion based on? How is the queue implemented?

• "Best First" search - try to expand node that is *guessed to be* "closest" to goal • Node selected for expansion based on an *evaluation function* (measuring dist. to goal) - node w/*lowest* evaluation is selected • Insert expanded nodes in dec.g order of desirability

What is Local Search, and what does it use?

• "Local search" starts in a state and moves only to neighboring states (path not retained) - little memory usage - can find reasonable sol'ns in large spaces • Uses an objective function to find best neighbor - iterative improvement

What is the PEAS description of Wumpus World? (look over slides for practice w/this)

• (P)erformance measure - +1000 for finding gold, -1000 for into pit/get ate - -1 for each action taken, -10 for using up arrow • (E)nvironment - grid of rooms, start in [1,1]; P(pit) = .2 - wumpus and gold in random locations • (A)ctuators - move fwd, turn L/R - grab gold, shoot arrow • (S)ensors - nose: squares adj. to wumpus are "smelly" - skin/hair: squares adj. to pit are "breezy" - eyes: "glittery" iff gold is in same square

In First-Order Logic, what does the world consist of? What do objects have?

• World consists of objects - things w/identifies • Objects have properties/relations that distinguish them from other objects

What are the properties of Greedy Search? (accg. to 4 common performance measures)

• *Completeness* - can be bad - can get stuck in loops - complete if check for repeated states • *Time* - can be bad - O(b^m), m = max depth (like DFS) • *Space* - can be bad - O(b^m), keeps all nodes in memory (worst case) • *Optimality* - can be bad - heuristic is estimate - improved by quality of heuristic

What are the properties of A* Search? (accg. to 4 common performance measures)

• *Completeness* - good - unless infinitely many nodes • *Time and space* - not good - still exponential (keeps all nodes in memory) - can do "Iterative Deepening A*" to conserve memory • *Optimality* - good - expands fewest nodes

What are the properties of Uniform-Cost Search? (accg. to 4 common performance measures)

• *Completeness* - good • *Time and Space* - can be bad - can be much greater than b^d: can explore large subtrees of small steps before exploring large (and perhaps useful) steps • *Optimality* - good

What are the properties of Iterative Deepening Search? (accg. to 4 common performance measures) What does it combine from two other search methods?

• *Completeness* - good • *Time* - not too bad - O(b^d), where d is depth of shallowest sol'n - same as BFS • *Space* - good - O(bd) - same as DFS • *Optimality* - good - combines benefits of DFS and BFS

What are the properties of Breadth-First Search? (accg. to 4 common performance measures)

• *Completeness* - good (if branching factor b is finite) • *Space* (nodes generated) - exponential, bad - major limitation of BFS, demands lots of space - O(b^(d+1)) = b + b^2 + ... + b^d + (b^(d+1) - b), where d = goal depth • *Time* - bad, same as space • *Optimality* - good - not optimal in general, shallowest may not be optimal path cost - optimal if path cost is non-decreasing function of node depth

What are the properties of Depth-First Search? (accg. to 4 common performance measures)

• *Completeness* - potentially not, can be bad - fails in infinite-depth spaces or w/loops • *Time* - bad - O(b^m), m = maximum depth - Bad if m is larger than depth of goal (d) - Good if multiple sol'ns (will hit one) • *Space* - better - O(mb) -- linear space, keeps only leaf nodes - need only store single path from root to leaf node, along w/remaining unexpanded siblings nodes for each node on path - *Takeaway*: linear fcn for O means tight memory, queue only has to keep fringe nodes, unlike BFS which keeps nodes below goal depth • *Optimality* - bad - returns first deepest sol'n, could miss shallower sol'n hasn't yet seen (even at low depth)

What are the properties of Minimax? (accg. to 4 common performance measures)

• *Completeness* - yes, if tree is finitie • *Time* - can be bad - DFS exploration - O(b^m), m = max depth, b = # of legal moves @ each point (impractical for real games) • *Space* - good - DFS exploration -- O(bm) • *Optimality* - yes against an optimal opponent - even better if MIN doesn't play optimally

Who are the three archetypal members of a design team? (knowledge systems)

• *Domain expert* - Experience solving prob.s in domain - Knows little about creation of KS's - Spells out needed skills to knowledge eng.r • *Knowledge engineer* - AI expert, has experience building KS's - May know little about the expert's domain • *End user* - Understanding of job, determines major design constraints

What is the difference b/t domain and task knowledge?

• *Domain knowledge* - general terminology and facts of domain w/o focus on a particular task • *Task knowledge* - terminology, computational models, and facts assoc.d w/performing a specific kind of task

What is an expert system? What does "expert" knowledge mean? What's the distinction b/t a knowledge and an expert system? (see pic on pg. 3 of "Knowledge and Expert Systems" slides)

• A computer program whose performance is guided by specific, expert knowledge in solving problems - focus on prob-solving; knowledge is that which can guide a search for sol'ns • "Expert" knowledge means *narrow* specialization but *substantial* competence -- IMPORTANT - but less breadth and flexibility than human experts -- like PhD in specific area of study • KS vs ES: - focuses attn on the knowledge that systems carry, r/t if it constitutes expertise - KS is more general, broad KB, while ES is a subset of KS

According to the principle of inference, when is a sentence valid? When is a sentence satisfiable?

• A sentence is *valid* IFF it is true under all possible interpretations in all possible worlds - Also called tautologies - eg. "There is a stench in (1,1)" • A sentence is *satisfiable* IFF there is some interpretation in some world for which it is true - could be true in some situation

What are a couple advantages of the backward state-space search over the forward?

• Advantage is that consider only *relevant* actions - those actions that achieve one of the conjuncts of the goal • Typically much lower branching factor than forward search

Describe the knowledge system design process

• After distilling knowledge used for tasks, begin actual design of system - select ways to rep.t knowledge - determine the search strat. - design the UI • Build a prototype to test • Refine by progressive approx.n - prob-solving mistakes lead to correction or additions to KB • Never really considered "finished" (bc of bugs, shortcomings, etc.)

How does Depth-First Search search? How does it implement the queue?

• Always expand *deepest* unexpanded node (on the fringe) first - L->R or R->L - only when hit "dead-end" (leaf) does search go back and expand nodes at next shallower level • Inserts expanded nodes at *front* of queue

What does the Knowledge Base (KB) contain? How do you add info to it, and how do you find out what info is in it?

• Contains set of "sentences" or factual statements - some assertions about the world expressed w/a knowledge representation lang. - initially contains some background knowledge -- innate knowledge • add new info w/ TELL fcn or by Inference -- deriving new sentences from old ones • query what is known w/ ASK fcn

What does Minimax determine, decide, and serve as?

• Determines the best moves for MAX, assuming that MAX and opponent (MIN) play perfectly • Decides best opening first move for MAX • Serves as basis for analysis of games and algorithms

How does Uniform-Cost Search search? What is it guaranteed to give? How does it implement the queue?

• Expand least-cost unexpanded leaf node first (modified BFS, r/t lowest-depth) - general additive cost fcn (NOT from current state to goal!) • guaranteed to yield cheapest sol'n • insert nodes in order of inc.g path cost

How does *Breadth-First Search* search? What does it find? How does it implement the queue?

• Expand root node, then all successors, shallowest unexpanded node first (L -> R or R -> L) - all nodes at level d are expanded before level d+1 (root node being d = 0) • Method finds *shallowest* goal state • Inserts expanded nodes at *end* of queue

How does Greedy Search search? What does it use to do this, and how does this implement the queue? Why is it not guaranteed to be optimal (caveat)?

• Expands the nodes that *appears* to be closest to the goal - takes the "biggest bite" out of the remaining cost to reach the goal (each step tries to get as close to goal as possible) • Uses heuristic evaluation function - estimates cost from node n to goal - orders heuristic cost of each node in queue from smallest to largest • Doesn't consider if actions will be best in long run -- hence greedy

What are explanation systems? What does this look like in MYCIN?

• Explanation systems - mechanism to explain or justify conclusions - "transparent box" r/t "black box" - user can discard conclusion if disagree • MYCIN explanation program built from collection of specialist subprograms - designed to answer part.r kinds of queries - examine diagnosis rule trace of system

Describe the 4 general steps in the Minimax Algorithm. What does the outcome of this determine?

• Generate whole game tree (or from current state downward -- online DFS process) • Apply utility fcn to terminal states - get payoff for diff.t final moves of the game • Use utilities at terminal states to determine utility of nodes one level higher in tree - ie. find MIN's best attempt to minimize high payoff for MAX at terminal level • Continue backing up the values to the root (one layer at a time) • Value @ root determines best payoff and opening move for MAX

What is MYCIN? What does it do? How does it work?

• Goal-driven, backward-chaining expert system that consults on diagnosis of infectious diseases - especially bacterial infections of the blood - use of explicit KB -- search for sol'ns guided by KB - looks for goal w/certainty w/1; when confidence measures for rule get below threshold, search is terminated • Translates pseudo-English rules to and from internal production-rule rep.n - uses keywords and templates to guide translation into internal symbol structures - implemented as production rules in LISP

What is Local Beam Search? What does it keep track of it? What is its process? What is useful info passed among and what is this different from?

• Keeps track of k states (not paths), rather than just one - begin w/ k randomly generated states - at each step, all successors of all k states are generated - if any one is a goal, then finished - else select k "best" successors from complete list and repeat • Useful info is passed among k parallel search threads - diff.t than random-restart approach

What does knowledge refer to? Define its constituents. What's another definition for it under this context?

• Knowledge refers to the "codified experience" of agents - *Codified* means knowledge has been formulated, recorded, and made ready for use - *Experience* is source of info for solving prob.s - *Codified experience* must be organized and generalized to guide future action • "Knowledge is what we have learned from our experiments"

What does A* Search minimize and avoid? What is its evaluation function? What kind of heuristic does it use?

• Minimizes *total* estimated path cost - avoids expanding paths already expensive • Evaluation function: f(n) = g(n) + h(n) - *g(n)*: *actual* cost *to* reach node n so far - *h(n)*: *estimated* cost *from* n to goal • Uses *admissible* heuristic - ie. h(n) <= h*(n) where h*(n) is true cost from n to goal - never overestimates cost

What are two roles of knowledge-based logic agents?

• Play crucial role in "Partially Observable" environments - Combine general knowledge w/current percepts to infer hidden aspects before acting • Aids in agent flexibility - Learn new knowledge for new tasks - Adapt to changes in enviro.t by updating relevant knowledge

What is AI Game Play? What is the approach to this, and what does this require? What allows for a more efficient search?

• Set up search space, then define the optimal move (in order to best the hostile agent -- need algorithm to find this • Ignore portions of search tree that make no difference to final choice -- pruning - allows for more efficient search while still getting correct answer

Characterize Shakey's world. What was (he? -- gender normative considerations here) capable of doing?

• Shakey's world: - 4 rooms along a corridor, door and light switch - Move from room to room - Push movable objects - "Climb: on and off of rigid objects (like boxes) - Turn light switches on and off • Capable of moving, grabbing, and pushing things, based on planning by STRIPS

How does Iterative Deepening Search expand upon "Depth-Limited" Search and why? What is the level of wastefulness and why? When is it preferred?

• Sidesteps issue of choosing best l (don't really know good depth limit for most prob.s) - tries all possible depth limits (0, then 1, then 2, etc.) • May seem wasteful, but overhead is not very costly - bc most of nodes are twd bottom of tree - not costly to generate upper nodes multiple times • Preferred w/large search spaces and unknown depth of sol'n

In inference in FOL, what is forward chaining, and when is it used? Describe it in some more detail

• Start w/sentences in KB and generate new conclusions - "Used when a new fact is added to database and want to generate its consequences" - what can you generate new that wasn't already explicitly told to you? - directionless: stops when nothing is left to generate, can give a bunch of useless facts

In inference in FOL, what is backward chaining, and when is it used?

• Start w/something want to *prove*, find implication sentences that allow to conclude it, then attempt to establish their premises in turn - "Used when there is a goal to be proved"

What are Genetic Algorithms? What is maintained? How are successor states generated?

• Stochastic hill-climbing searches • Large pop'n of states is maintained • Successor states are generated by combining 2 parents states (crossover) and changing (mutating) • Best "offspring" make up new pop'n

How does ExpectiMiniMax work?

• TERMINAL, MAX, MIN nodes work same as Minimax • CHANCE nodes are evaluated by taking *weighted average* of values resulting from all possible chance outcomes • Process is backed-up recursively all the way back to the root (as w/Minimax)

What are atomic sentences?

• consist of a single proposition symbol (prop. logic) • Collection of terms and relation(s) that together state facts (FOL)

What is heuristic dominance?

• if h2(n) >= h1(n) for all n (both being admissible) - then h2 dominates h1, and h2 is better for search • as the larger h2 is closer to the optimal/true total cost h*, then A* using h2 will expand fewer nodes (on average) -- cannot overestimate the true cost

For S1 and S2, when is S1 => S2 true? When is it false? What claim can be made if S1 is true?

• is true IFF S1 is false OR S2 is true • is false IFF S1 is true AND S2 is false - if S1 is true, then claiming that S2 is true, otherwise make no claim


Related study sets

AMDT 108 Final Exam (Chapters 8-13)

View Set

Exam 1 (Physical Fitness and Wellness, & Body Comp.)

View Set

Chapter 56 Drugs for Psychotic Disorders

View Set