Artificial Intelligence Midterm

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

1. What do each of the lettered items in the following diagram represent? b! f->d->a |. C E

A - agent B- sensors C - effectors D - percepts E - actions F - environment

Soccer: Fully / partial observable deterministic / stochastic episodic / sequential static / dynamic discrete / continuous

Fully Stochastic Episodic Static cContinuous

What is the difference between forward propagation and backward propagation in Neural network and what mathematical techniques are used for each?

In neural networks, you forward propagate to get the output and compare it with the real value to get the error. Now, to minimize the error, you propagate backwards by finding the derivative of error with respect to each weight and then subtracting this value from the weight value. Feed Forward Given its inputs from previous layer, each unit computes affine transformation z = W^Tx+ b and then apply an activation function g(z) such as ReLUelement-wise. During the process, we'll store (cache) all variables computed and used on each layer to be used in back-propagation. Back propagation Allows the information to go back from the cost backward through the network in order to compute the gradient. Therefore, loop over the nodes starting at the final node in reverse topological order to compute the derivative of the final node output with respect to each edge's node tail. Doing so will help us know who is responsible for the most error and change the parameters in that direction.

what does beta represent

Beta - Best already explored option along path to the root for minimizer

what does alpha represent

Alpha - best already explored option along path to the root for maximizer

List the steps needed to teach a neural network to recognize handwritten digits ( like the MNIST database)?

1. start with values(often random) for the network parameters(wit weights and bi biases) 2. take a set of examples of input data and pass them through the network to obtain their prediction 3. compare these predictions obtained with the values of expected labels and calculate the loss with them 4. Perform the back propagation in order to propagate this loss to each and every one of the parameters that make up the model of the neural network 5. use this propagated info to update the parameters of the neural network with the gradient descent in a way that the total loss is reduced and a better model is obtained 6. continue iterating in the previous steps until we consider that we have a good model

Apply a 3X3 maxpooling on Fig 3 and show the resultant feature map 33210 00131 31223 20022 20001

333 243 323

Apply a 3*3 maxpooling with a stride of 2 on the following matrix and show the resultant feature map. 31124 13312 02120 11340 25421

34 54

Using alpha beta pruning and minimax optimization, complete the following tree. max min max 3569120-1

5 50 5920 3569120-1

Explain the SVM algorithm giving one example

A Support Vector Machine model generates the optimal separator to classify a given set of data. It takes the form of a multidimensional hyperplane which separates the data intoseparate classes. The objective of the support vector machine algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points. Given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. example:Classification of images

Describe the Turing Test?

A computer passes the test if a human interrogator after posing some written questions, cannot tell whether the written responses come from a person or from a computer.

Explain the structure of a decision tree along with the concept of splitting, and explain the two different types of problems decision trees can solve. What is the purpose of pruning?

A decision tree is similar to an upside-down tree. The root node is the top most node in a tree and branches out into more nodes (splitting). Splitting is a process of dividing a node into two or more sub-nodes. The dataset is split into subsets and the splits form from a particular variable. The leaf nodes (nodes that do not branch to any other node). Each leaf node holds a prediction value, either numeric or categorical. Decision trees can solve both regression and classification problems. Regression problems have continuous (quantitative) variables (quantitative) with real number targets, and classification problems have discrete (qualitative) variables, where data is classified into two or more classes. Pruning is used to mitigate overfitting of data due to a decision tree being too complex and not being able to make accurate predictions of new training samples. Pruning reduces the size (complexity) of the decision tree so the model can better generalize. The result of pruning is branches turning into leaf nodes.

In the context of A*, what does it mean for a heuristic to be admissible? What is a good method for determining heuristics?

A heuristic h(n)is admissibleif for every node n, h(n)≤h*(n), whereh*(n)is the true cost to reach the goal state from n. An admissible heuristic never overestimatesthe cost to reach the goal, i.e., it is optimistic good method: greedy search + uniform-cost search evaluation function is f(n) = g(n) + h(n) [evaluated so far + estimated future] f(n)= estimated cost of the cheapest solution through n

If we want to find the shortest solutions using A* to solve the 15-puzzle above, what are two possible heuristics we could use? Find the start state of the two heuristics you chose for the 15-puzzle above.

Answer: One heuristic you could use is the number of tiles that are misplaced, and the second heuristics could be the sum of the distances of the tiles to their goal state. Find the start state of the two heuristics you chose for the 15-puzzle above. Answer: H1: number of tiles that are misplaced = 14 H2: sum of the distances of the tiles to their goal state = 35

a. What is an activation function? Why are they so important in artificial neuralnetworks? What is an example of an activation function?

An activation function is a function that is applied to the weighted sum of a neuron's inputs to produce its final output. They are important because a neural network without activation functions can only represent a linear function. An example of an activation function is the rectifier, which leaves nonnegative inputs unchanged and sets negative inputs to zero.

Compare and contrast an intelligent agent, a rational agent, and an agent.

An agent is something that can perceive its surrounding environment through sensors and can act upon the environment through actuators. A rational agent is an agent that performs an positive action (one that increases its performance measure), based on what it has sensed and what inherent knowledge it possess. An intelligent agent is an agent that performs the best possible action in a situation. A rational agent can be intelligent, but not all rational agents are intelligent, as they may not be able to sense all of the right things or think ahead enough to perform what is best in the long run.

1. Go through Turing's list of alleged "disabilities" of machines, identifying which have been achieved, which are achievable in principle by a program, and which are still problematic because they require conscious mental states. a. Be kind:. b. Resourceful:. c. Beautiful: d. Friendly:. e. Have initiative: f. Sense of humor: g. Tell right from wrong: h. Make mistakes:. i. Fall in love: j. Enjoy strawberries and cream: k. Make someone fall in love with it:. l. Learn from experience:. m. Use words properly: n. Be the subject of its own thought: o. Have as much diversity of behavior as man: p. Do something really new:.

Be kind: There are programs that are helpful, but to be kind means it needs some internal state. This is problematic. Resourceful: Many programs are clever at finding ways of doing things. Many people agree machines are or seem to be clever. Achieved. Beautiful: Industrial artifacts are proof of picturesque objects. Achieved. Friendly: Same as kind, internal state is required. Problematic. Have initiative: Achievable in principle, as a machine can be programmed to respond to incidents and take periodic actions to prevent them. Sense of humor: Achievable in principle Tell right from wrong: Problematic conscious aspect, AI is already helping lawyers make decisions, but morality and ethics weighs in. Make mistakes: Achieved, many programs are problematic. Fall in love: Problematic, internal state required. Enjoy strawberries and cream: No taste from machines, enjoyment requires internal state. Problematic. Make someone fall in love with it: Teddy bears and dolls have already been doing this. Learn from experience: AI has achieved this. Use words properly: Natural language processors are able to use properly and effectively within given domains Be the subject of its own thought: Depends on the definition of thought. Many machines "think" within their instructions but lack the thought of self-image. Have as much diversity of behavior as man: Problematic, not yet achieved.

What is the difference between a convolutional Neural network and Recursive neural network and give an example for each

CNN: CNN takes a fixed size inputs and generates fixed-size outputs. CNN is a type of feed-forward artificial neural network - are variations of multilayer perceptronswhich are designed to use minimal amounts of preprocessing. CNNs use connectivity pattern between its neurons and is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field. CNNs are ideal for images and video processing. Example. Image recognition RNN: RNN can handle arbitrary input/output lengths. RNN unlike feedforward neural networks - can use their internal memory to process arbitrary sequences of inputs. Recurrent neural networks use time-series information. i.e. what I spoke last will impact what I will speak next. RNNs are ideal for text and speech analysis. Example.Translatingfrom one language to another

Describe how the alpha-beta strategy relates to minmax. What do alpha and beta represent?

Can we improve search by reducing the size of the game tree to be examined? If a move is determined worse than another move already examined, then there is no need for further examination of the node. Alpha - best already explored option along path to the root for maximizer Beta - Best already explored option along path to the root for minimizer

Give the following properties for Min-Max algorithm: Complete? Optimal? Time complexity? Space Complexity?

Complete? - yes, if tree is finite Optimal? - yes against an optimal opponent Time complexity? - O(b^m) Space Complexity? - O(bm) (depth-first exploration, if it generates all successors at once)

Backgammon: Fully / partial observable deterministic / stochastic episodic / sequential static / dynamic discrete / continuous

Fully Deterministic Sequential Static Discrete

Wumpus World: Fully / partial observable deterministic / stochastic episodic / sequential static / dynamic discrete / continuous

Fully Deterministic Sequential Static Discrete

Explain the Decision tree algorithm giving a one example

Decision tree is a classifier in the form of a tree structure Decision node: specifies a test on a single attribute Leaf node: indicates the value of the target attribute Arc/edge: split of one attribute Path: a disjunction of test to make the final decision compares the different possible outcomes example:play tennis outlook sunny overcast rain humidity high normal no yes

How does a linear regression algorithm work? Be sure to explain the cost function as part of your answer.

Linear regression is a method of finding a function to find trends in data and make predictions. The coefficients of the linear function are defined as θ0and θ1. Where y = θ0+ θ1X. These values are found using a cost function which takes the sum of all errors squared. Where error is the difference between a predicted y (found using the hypothesis a and b values) and the actual y. The greater number of training examples, the more data the cost function will have to minimize θ0and θ1.

Distinguish between Linear regression and Logistic regression giving an example each (Linear)

Linear regression is a method of finding a function to find trends in data and make predictions. The coefficients of the linear function are defined as θ0and θ1. Where y = θ0+ θ1X. These values are found using a cost function which takes the sum of all errors squared. Where error is the difference between a predicted y (found using the hypothesis a and b values) and the actual y. The greater number of training examples, the more data the cost function will have to minimize θ0and θ1. example: predicting the price of a house

Distinguish between Linear regression and Logistic regression giving an example each (Logistic)

Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Some of the examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a probability value. The hypothesis of logistic regression tends it to limit the cost function between 0 and 1. Therefore linear functions fail to represent it as it can have a value greater than 1 or less than 0 which is not possible as per the hypothesis of logistic regression. Logistic regression is used in applications such as: 1. Classifying customers as returning or non-returning (classification) 2. Finding factors that differentiate between male and female top executives (profiling) 3. Predicting the approval or disapproval of a loan based on information such as credit scores (classification).

Breath first search Time complexity Space complexity complete optimal

O(b^(d+1)) O(b^(d+1)) yes yes

iterative deepening Time complexity Space complexity complete optimal

O(b^d) O(bd) yes yes

depth-limited Time complexity Space complexity complete optimal

O(b^l) O(bl) no no

depth first search Time complexity Space complexity complete optimal

O(b^m) O(bm) no no

uniform cost Time complexity Space complexity complete optimal

O(b^|'C*/E'|) O(b^|'C*/E'|) yes yes

In general, without reference to specific types of models or analysis, what causes overfitting, how can it be detected, and how can it be avoided?

Overfitting occurs when a model contains too much information about the training data it is based on. It can be detected by training the model based on a randomly selected subset of the available data and comparing the model's performance on the training data to its performance on the remaining test data. Overfitting can be avoided by simplifying a model, reducing the amount of information it contains.

Poker: Fully / partial observable deterministic / stochastic episodic / sequential static / dynamic discrete / continuous

Partial Stochastic Sequential Static Discrete

You are a test-taking agent for AI exams. Describe the PEAS for yourself and describe your environment.

Performance -100 % points Environment -exam hall, faculty, other students Actuators -writing the answer script ( paper, pen) Sensors - eyes, holding the pen

Explain the K-Nearest Neighbour Algorithm and give one example

The K-nearest Neighbors model uses proximity as a classification metric KNN works by finding the distances between a query and all the examples in the data, selecting the specified number examples (K) closest to the query, then votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression). Example:Should the bank give a loan to an individual? Would an individual default on his or her loan? Is that person closer in characteristics to people who defaulted or did not default on their loans?

a. What makes a convolutional neural network different then a standard fully connected neural network?

The convolutional layer, of the convolutional neural network, makes a set of learnable filters. These filters are used to detect the presence of features and patterns present in the original input

The 2 "area" vacuum world problem we looked at had 8 total states. Imagine a vacuum world with n areas. How many states exist and why?

The state is determined by both the agent location and the dirt locations. The agent is in one of two locations, each of which might or might not contain dirt. Thus, there are 2 × 22 = 8 possible world states. A larger environment with n locations has n ・ 2n states.

What is the difference between weak and strong AI? What is the purpose of the Chinese Room thought experiment?

Weak AI is defined as AI that makes intelligent decisions. Strong AI is AI that is capable of consciousness and truly "thinking." The Chinese Room is an experiment in which one imagines a person locked in a room, being fed instructions and the tools to perform these instructions, specifically being given one set of symbols and then returns another set of symbols. This person could be translating Chinese using this system and have no knowledge of Chinese themselves, but produce perfect Chinese. This thought experiment was created to show that an AI could, without actually "thinking" or "knowing", can from an outside perspective appear to be "thinking" or "knowing."

1. For the following trees, number the nodes in the order in which they are explored for breadth-first search and for depth-first search. The searches always start from the left most node. 1 11 1111 11111111

bfs 1 2345 678910111213 dfs 1 29 361013 457811121415

What are the five components of a search problem?

initial state - starting state actions - all possible actions available to agent transition model - description of what each action does. A successor is any state reachable from a given state by applying a single action goal test - apply function to given state to see if it is goal state path cost - some paths cost more than others, we have some functions that assigns costs to paths

Show the results of the min max tree max min max min 2 19 -15 26 16 -30 48 13 -18 -29 -10 -31 16 -49 -11 -2

max 2 min 2 -29 max 2 12 -29 -11 min 2 -15 -30 12 -29 -31 -49 -11 2 19 -15 26 16 -30 48 13 -18 -29 -10 -31 16 -49 -11 -2

1. In the diagram below a two perceptron network is shown. The first layer is the input layer. The second layer is the output layer. The two inputs are 1.0 and 0.5. Weights are assigned as follow: W1.1.= 0.9 W1.2= 0.2 W2.1= 0.3 W2.2= 0.8 The activation function for the output is the sigmoid. Calculate the output of the layer 2 neurons. input 1 - 1 input 2 - 0.5

output one:1(0.9)+0.5(0.3)=1.05, 1/1+e^-1.05 = 0.7407 output two:1(0.2)+0.5(0.8)=0.6, 1/1+e^-0.6= 0.6456

For machine learning, what are the consequences of a small amount of training examples combined with many inputs? What are the ways to improve learning in such situations?

overfitting - learns fast but does not give good results improvements - dropout and regularization

Consider the graph below of cities in Australia. Using breadth first search, what is the final path taken from where you are (p) to Sydney (SYD)? Does anything change when using depth first search?

p->adl->mfl->cbp->syd For depth first, the final path would be the same, but the order that the nodes are visited would change

Imagine you are a sanitation worker at a hotel who is in charge of cleaning rooms as guests check out and others check in. Your goal is always to clean the rooms, using your cleaning equipment as quick and efficiently as possible. Provide a (possible) description of the task environment for your job, outlining performance measure, environment, actuators, and sensors).

performance measure - fast service, sanitary, discrete environment - hotel rooms, supplies rooms, laundry room actuators - broom, vacuum, rag, spray bottle sensors - eyes, hands, washing machine

What does the Acronym PEAS stand for? Give an example of each for an AI agent playing virtual chess.

performance measure - winning environment - board positions, opponent actuators - moving the pieces sensors - eyes

What is the difference between Supervised learning and unsupervised learning giving examples for each

supervised learning you train the machine using data which is well "labeled", allows you to collect data or produce a data output from the previous experience. unsupervised learning is a machine learning technique, where you do not need to supervise the model.

In which four categories do the authors of the book classify the views of artificial intelligence?Which view do the authors follow? why?

thinking humanly, acting humanly, thinking rationally, acting rationally authors have chosen thinking rationally, rational agent because it is more general than the laws of thought because correct inference is one of several possible mechanisms for achieving rationality. it is also more amenable to scientific development

Find a path from Lugoj to Bucharest using Greedy Search and A*. Use straight-line distance as the heuristic. Compare and discuss the results in terms of optimality, space and time complexity and completeness.

write this out

1. Define the following in your own words (1 sentence each): ● Rational Agent ● Autonomy ● Stochastic ● Sequential ● Dynamic ● Continuous

●Rational Agent - For each possible percept sequence, a rational agent should select an action that is expected to maximize its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has. ● Autonomy - the extent that an agent relies on its own percepts rather than on the prior knowledge of its designer ● Stochastic - the state of the environment is non-deterministic ● Sequential - In sequential environments, the current decision can affect all future decisions ● Dynamic - If the environment can change while an agent is deliberating, then we say the environment is dynamic for that agent ● Continuous - Taxi driving is a continuous-state and continuous-time problem: the speed and location of the taxi and of the other vehicles sweep through a range of continuous values and do so smoothly over time


Ensembles d'études connexes

Yr9 CC (LT1): Alexander the Great (3): King of Persia

View Set

Digital Annotation PDF & Vocabulary Lesson

View Set

N136 MCA-2 Week 4 Neurologic and Sensory Systems

View Set

Unit 8: Lesson 2: Feudalism and the Manor Economy Q&A

View Set