AI Final Quiz Questions
Convert to Conjunctive Normal Form: (P -> (Q <-> R))
(-P V -Q V R) ^ (-P V Q V -R)
Suppose a doctor knows that there is a 1% chance that any given person in the US has a fever at this time of year. And suppose that that doctor knows that about 1 in a million people in the US have COVID-19, i.e., the coronavirus. And that doctor knows that if a patient has COVID-19, there's an 80% chance that they will have a fever. If the doctor gets a patient with a fever (and doesn't have any other information about that patient), what probability should she calculate for that patient having COVID-19?
.008% (= 1/12,500)
Using expectiminimax, calculate the value for node A A /. \ B C /0.5 \0.5 /0.2 \0.8 D. E. F. G / \ / \ / \ / \ 2. 3. 3. 0. -2. 1. 4. -2
1
What Should the value for node C be? Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
1
What Should the value for node B be? Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
3
What Should the value for node D be? Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
3
Are A and B knights or knaves? Or is there not enough information. A says we are both knaves B says nothing
A is a knave and B is a knight
Are A and B knights or knaves? Or is there not enough information. A says "I am a knave or B is a knight" B says nothing
A is a knight and B is a knight
to find node G What is Order of nodes expanded for Depth First Search? *A* g(n) = 0,h(n) = 12 /. \ *B* g(n) = 4 h(n) = 5 *C* g(n) = 3 h(n) = 7 /. \ /. \ *D* g = 10 h= 6 *E* g= 6 h = 2. *F* g = 4 h = 6. *G* g= 9 h = 0
ABDECFG
In order to pass the turing test, you must design a machine that
Acts like a human
Which of the following problems can be modeled as CSP? Crossword Puzzle 8-Queens Problem Map coloring Problem Sudoku Finding the shortest route Assembly line scheduling
Crossword puzzle, 8 queens, map coloring, sudoku, assembly line scheduling
Assume that minimax explores nodes from left to right. For example it would explore D and the nodes below it before E. Using alpha beta pruning the search algorithm could have ignored node E and below. Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
False
Depth first search always expands at least as many nodes as A* search with an admissible heuristic
False
Popular search algorithms like BFS, DFS and A* will solve CSP's efficiently
False
Simple reflex agents often incorporate search strategies
False
When mapping a real-world problem to a state space the state representations should maintain as much detail as possible.
False
Are A and B knights or knaves? Or is there not enough information. Both A and B says "I am a knight"
Not enough information
The strategy of depth first search is to...
Operate on first in last out
The strategy of breadth first search is to ...
Operates on first node in, first out
A minimax search on a large tree requires an evaluation function. Which nodes are evaluated using this function? The root The leaves The min-nodes all of the above
The leaves
∀x P(x) <-> ~∃x ~P(x) a. True b. False
True
Which of the following are factors used by the Page Rank algorithm to evaluate the relevance of a page to a query for information retrieval? (Select all that apply.) a. Damping factor b. The quality of outgoing links c. The number of in-linking sites. d. Authority of in-linking sites
a. Damping factor c. The number of in-linking sites. d. Authority of in-linking sites
Actions in STRIPS include ... (mark all that apply) a. Delete-list b. Add-list c. Goal states d. Preconditions
a. Delete-list b. Add-list d. Preconditions
Which fo the following are ways of helping make a neural network model generalize better? (select all that apply) a. Dropout b. Regularization c. More hidden layers d. Additional training epochs
a. Dropout b. Regularization
An inference method is sound if it .... a. Produces only entailed sentences b. Is able to produce every expression that is entailed by the KB c. is efficient in both time and space d. is a tautology
a. Produces only entailed sentences
P(speeding | ticket) = .63 is an example of... a. Diagnostic inference b. Causal inference c. inter causal inference d. mixed inference
a. diagnostic inference
Which of the following were used by FASTUS to do Information Extraction? (Select all that apply.) a. finite state automata b. a grammar of the language's syntax c. word-level combination mechanisms d. simple processes for grouping characters into words
a. finite state automata c. word-level combination mechanisms d. simple processes for grouping characters into words
The STRIPS language requires... (mark all that apply) a. initial State b. Actions c. Goal state d. Heuristics
a. initial State b. Actions c. Goal state
A Constraint Satisfaction problem consists of these components (mark all that apply) a. A search algorithm b. A set of constraints c. A set of path costs d. A set of variables e. A set of domains for each variable
b. A set of constraints d. A set of variables e. A set of domains for each variable
Which of the following are significant components of the BM25 Information Retrieval algorithm of Robertson and Sparck Jones? a. the number of terms in a query b. document length c. how often the query terms occur in the document d. how often the query terms appear in the whole corpus e. length of query terms
b. document length c. how often the query terms occur in the document d. how often the query terms appear in the whole corpus
An inference procedure is.... a. is a declarative knowledge representation b. provides rules for deriving new facts from existing facts c. is a proof. d. is a type of inheritance
b. provides rules for deriving new facts from existing facts
P(speeding | ticket /\ runRedLight) = 0.0027 is an example of..... a. Diagnostic inference b. Causal inference c. inter causal inference d. mixed inference
c. inter causal inference
What does the book say happens when two words have very similar meanings? a. You have to rely on n-grams to differentiate the words b. Natural language processing mechanisms are completely ineffective c. The words can be used interchangeably in applications d. People speaking the language change the meaning to make them different
d. People speaking the language change the meaning to make them different
What move should max make? Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
to the left (B)
What Should the value for node E be? Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
4
Are A and B knights or knaves? Or is there not enough information. A says "B is a knight" B says "The two of us are opposite types"
A is a knave B is a knave
to find node G, What is Order of nodes expanded for A* Search? *A* g(n) = 0,h(n) = 12 /. \ *B* g(n) = 4 h(n) = 5 *C* g(n) = 3 h(n) = 7 /. \ /. \ *D* g = 10 h= 6 *E* g= 6 h = 2. *F* g = 4 h = 6. *G* g= 9 h = 0
ABECG
An intelligent agent can search through the states of a CSP by answering which of the following questions: a. Which variable should be assigned next? b. In what order should its values be tried? c. Can we detect inevitable failure early? d. Can we take advantage of the problem structure? e. All of the above
All of the above
The term _________ is used for a depth first search that chooses values for one variable at a time and returns when a variable has no legal values left to assign
Backtracking
The strategy of uniform cost is to ...
Expand node with lowest g(n) = distance from start to n
The strategy of A* search is to
Expand nodes with the lowest f(n) = g(n) + h(n)
The strategy of best first search is to..
Expand the node with the lowest h(n)
Given the following knowledge base KB.... P P V Q is query Q entailed by KB?
False
Non monotonic logics provide the means of reasoning with uncertainity
False
One drawback of neural networks is that they only work on linearly separable spaces
False
Since the conditional probability tables of Bayesian Networks are often much smaller than the joint probability space, it is sometimes impossible to reproduce this space.
False
The minimum remaining values heuristic will immediately detect failure in a branch because it chooses the variable with the greatest number of legal moves
False
∀x ∃y P(x, y) <-> ∃y ∀x P(x, y) a. True b. False
False
Assume that a rook can move on a chessboard any number of squares in a straight line, vertically or horizontally. but cannot jump over other pieces; then Manhattan distance is an admissible heuristic for the problem of moving the rook from square A to square B in the smallest number of moves.
False, manhattan distance counts the amount of squares moved, so if the rook was 5 spaces vertically away from square B, the h(n) = 5, when in reality it is only 1 move for a rook. This is overestimating the problem and makes the heuristic inadmissible.
Which of the following problems would be good targets for an intelligent agent incorporating A* search? a. Solving a rubiks cube b. Writing a poem c. Finding a path from Stormwind to Orgimmar d. Playing chess
Solving a rubiks cube Finding a path from Stormwind to Orgimmar Writing a poem doesn't really have a heuristic and playing chess is done using monte carlo or minimax
Some cave dwellers use fire. all who use fire have intelligence. So.. a. All who have intelligence use fire b. Some cave dwellers have intelligence c. All cave dwellers d. None of these validly follows
Some cave dwellers have intelligence
Match the type of learning approach with a suitable application: 1. Supervised 2. Unsupervised 3. Reinforcement 4. Semi-supervised ____ Improve a chess-playing agent's performance _____ Classifying images of digits _____ infer sets of words used in similar ways in wikipedia _____ grade a few hundred essays
Supervised - Classifying images of digits Unsupervised - Infer set of words used in similar ways in wikipedia Reinforcement - Improve a chess playing agent's performance Semi-supervised - Grade a few hundred essays
Which games would be appropriate targets for minimax? Tic-Tac-Toe Backgammon Poker Go Solitaire Chess
Tic-tac-toe, Go, Chess
Artificial Intelligence is commonly taught through the use of intelligent agents. Agents (humans robots, etc) observe the world through percepts and act on the world through actuators. Internal to the agent is a function that maps percepts to actions.
True
Breadth first search is a special case of uniform-cost search ***
True
Classical logic can only help in representing knowledge that can only be false or true
True
Consider an Intelligent Agent designed to solve cross word puzzles. Is this environment fully observable?
True
Constraint propagation, of which arc-consistency is an example, enforces constraints locally
True
Forward checking can identify an unproductive branch fo the search tree by keeping track of remaining legal values for unassigned values
True
Given the following knowledge base KB... A <-> B the query ~A V B is entailed by the KB
True
Goal-based agents often incorporate search strategies
True
Non-monotonic logics provide the means of retracting some of the conclusions we believed at an earlier stage.
True
One typical drawback of neural networks is that the models are difficult to interpret
True
The drawback of breadth first search is the space complexity O(b^d), the drawback of depth-first search is that its not guaranteed to find an optimal solution (or a solution at all)
True
The learning routine of neural networks is primarily focused on updating the weights of edges connecting perceptrons in order to better approximate the training examples
True
The learning routine of neural networks is primarily focused on updating the weights of edges connecting simple nodes in order to better approximate the training examples
True
Using alpha beta pruning, the search algorithm could have ignored G and below Max A /. \ Min B C /. \. /. \ Max D. E. F. G Min /. \. /. \ /. \ 2. 3 4. 1. 0. 2
True
When mapping a real-world problem to a state space, the successor function links states together through abstractions of the agent's actuators.
True
N-fold cross validation - a common technique for evaluating machine learning and artificial intelligence models -- builds n models, evaluates each instance exactly once and uses each instance as a training example exactly n-1 times.
True Example of N-fold cross validation: 1. Split dataset into folds. Say, 10. 2. Build model on 1, test on 9. 3. Build model on different fold, test on remaining 9. 4. Average results, identify key variables. 5. Using these 'optimal' variables, calculate your overall final model.
to find node G, What is Order of nodes expanded for Uniform Cost Search? *A* g(n) = 0,h(n) = 12 /. \ *B* g(n) = 4 h(n) = 5 *C* g(n) = 3 h(n) = 7 /. \ /. \ *D* g = 10 h= 6 *E* g= 6 h = 2. *F* g = 4 h = 6. *G* g= 9 h = 0
Uniform cost pops the lowest path cost from a priority queue ACBFEG
What is the problem with n-gram language models that smoothing solves? a. If an n-gram is in a test text but not in the training set, then that whole text would be assigned a probability of 0 b. Each time you observe an n-gram, that changes the results of the experiment c. Every time they are used, the statistical calculations need to be recomputed d. The likelihood of most of the words are very small, so they must be estimated
a. If an n-gram is in a test text but not in the training set, then that whole text would be assigned a probability of 0
A Goal state .. (mark all that apply) a. Is represented as a conjunction of propositions b. May contain or's. c. Contains the necessary variables d. Is achieved when all sub goals are achieved
a. Is represented as a conjunction of propositions d. Is achieved when all sub goals are achieved
Which of the following are significant advantages of using shared features and biases in convolutional layers for image recognition? a. It reduces training time b. They allow the detection of the same feature wherever it is in the image c. It allows you to use a different cost function which provides a more accurate estimate of accuracy d. There are significantly fewer weights to adjust than for fully connected layers.
a. It reduces training time b. They allow the detection of the same feature wherever it is in the image d. There are significantly fewer weights to adjust than for fully connected layers.
Which of the following are results of using pooling after convolutional layers for image recognition? (select all that apply) a. It results in the condensation of information found in the convolutional layer b. Training time is dramatically decreased c. The representation of the image becomes uniform over the affected areas d. You lose information about the exact location of the feature
a. It results in the condensation of information found in the convolutional layer d. You lose information about the exact location of the feature
Which of the following best describes the overall task that the Socher et al paper is trying to accomplish? a. Predict the relative frequency of each sentiment tag for a sentence. b. Show how the recursive auto encoder procedure is similar to what humans do c. Classify a sentence into one of five sentiments d. Infer a new grammar for the sentences in the corpus
a. Predict the relative frequency of each sentiment tag for a sentence.
Which of the following are significant justifications for applying Ockham's razor? a. Similar hypotheses are more likely to generalize better b. A simpler solution is easier to remember c. Bayes rule gives simpler solutions an advantage due to higher prior probabilities d. Solutions with fewer parameters give more accurate results
a. Similar hypotheses are more likely to generalize better c. Bayes rule gives simpler solutions an advantage due to higher prior probabilities
What happens when a learned decision tree has seen no training examples with the same attribute values of a particular test case? a. The tree might make some mistakes b. The tree gives the plurality choice answer c. More training examples might correct mistakes d. The tree requests additional information
a. The tree might make some mistakes b. The tree gives the plurality choice answer c. More training examples might correct mistakes
Which of the following are significant similarities between natural languages (human languages) and artificial languages (programming languages) a. Their words can be ambiguous b. They have precisely defined models which indicate how to interpret a given utterance c. They can evolve over time d. There can be different dialects
a. Their words can be ambiguous c. They can evolve over time d. There can be different dialects
Mycin... (select all that apply) a. Was designed as an expert system for diagnosis and treatment of bacterial infections. It performs as well as the best medical experts in the field. b. Relies on sentences that are labeled not with probabilities, but with certainty factors ranging from -1 to 1 c. Uses a special (ad-hoc) form of Modus Ponens. d. Labels instances based on probability theory e. Is able to combine evidence.
a. Was designed as an expert system for diagnosis and treatment of bacterial infections. It performs as well as the best medical experts in the field. b. Relies on sentences that are labeled not with probabilities, but with certainty factors ranging from -1 to 1 c. Uses a special (ad-hoc) form of Modus Ponens. e. Is able to combine evidence.
The arc consistency approach (mark all that apply): a. requires a considerable amount of overhead b. requires almost no overhead c. can eliminate large parts of the state space d. rarely offers any improvement
a. requires a considerable amount of overhead, c. can eliminate large parts of the state space
P(vomiting | flu) = .72 is an example of... a. Diagnostic inference b. Causal inference c. inter causal inference d. mixed inference
b. Causal inference
An inference method is complete if it .... a. Produces only entailed sentences b. Is able to produce every expression that is entailed by the KB c. is efficient in both time and space d. is a tautology
b. Is able to produce every expression that is entailed by the KB
Which of the following are significant problems of the boolean keyword approach to Information retrieval? a. They generally return more matches than they should b. It does not support ranking of the results that are returned c. Users might not know how to create good queries d. The results are not efficiently computable
b. It does not support ranking of the results that are returned c. Users might not know how to create good queries
Why is saturation a problem for neural networks (especially deep ones)? a. It results in overfitting b. It slows down learning c. It makes it difficult for gradient to avoid local minima d. The hidden layers in the network become especially difficult to interpret
b. It slows down learning
Anyone who has just lost a lot of blood is likely to faint. No one who is likely to faint is a safe pilot. So... a. Everyone who has just lost a lot of blood is a safe pilot b. No one who has just lost a lot of blood is a safe pilot c. All safe pilots have just lost a lot of blood d. None of these validly follows
b. No one who has just lost a lot of blood is a safe pilot
Characterize P -> ((Q V R) -> P) a. Unsatisfiable b. Tautology c. Satisfiable but not tautology d. Not a propositional logic sentence e. None of the above
b. Tautology
Which of the following are attributes that recurrent neural networks have which non-recurrent networks don't? a. Lower computational complexity b. The ability to make connections over time / space c. The ability to handle an unspecified number of inputs d. They can learn what to forget
b. The ability to make connections over time / space c. The ability to handle an unspecified number of inputs d. They can learn what to forget
What is saturation in neural networks? a. When the sum of all the weights exceeds some predetermined threshold b. When the activation of some output neurons are very high or very low c. When there are so many different weights in the network that it takes too long to calculate gradient descent d. When some of the network's weights reach floating point overflow
b. When the activation of some output neurons are very high or very low
What is the overall (approximate) goal of the attribute selection mechanism in the decision tree learning algorithm? a. Produce the most accurate classification of new items b. minimize the tree's depth c. minimize the tree's branching factor d. create a balanced tree
b. minimize the tree's depth
Imagine an agent designed to sell hot dogs at Wrigley Field. Its current goal state is to give(hotdog, Joe). Which planning technique would be most appropriate? a. Forward Planning. The stadium holds thousands of individuals; we do not want the agent to evaluate give(hotdog, x) for every x. b. Forward Planning. There are only a small number of actions our agent might perform relative to the number of people in the stadium. c. Backward Planning. The stadium holds thousands of individuals; we do not want the agent to evaluate give(hotdog, x) for every x. d. Backward Planning. There are only a small number of actions our agent might perform relative to the number of people in the stadium.
c. Backward Planning. The stadium holds thousands of individuals; we do not want the agent to evaluate give(hotdog, x) for every x.
In neural networks, hidden layers .. a. Can be seen as newly constructed features that make the target concept linearly separable in the transformed space b. Can be interpreted as representing meaningful features such as vowel detectors or edge detectors c. Both d. Neither
c. Both
What is the main advantage of deep learning networks over NNs which have just one hidden layer? a. The longer training time makes the network more deliberate about its choices b. More weights and biases allows them to handle more complex functions c. Hierarchical representations d. Training time is significantly decreased
c. Hierarchical representations
Which of the following best describes how Socher et al's system "parses" a sentence into components? a. It works backwards through the sentence, combining the last two words, then the one before that, etc. b. Like FASTUS, it uses simple automata-based phrase constructors c. It recursively combines neighboring elements that allow for the smallest reconstruction error. d. It uses syntactic grammar that was learned from a large corpus of annotated texts
c. It recursively combines neighboring elements that allow for the smallest reconstruction error.
Which of the following would be a reasonable specification in propositional logic of the Wumpus World state where the agent is in cell (1,3), does perceive a stench, does not perceive a breeze, knows that there is a pit in cell (1,4), and does not know where the wumpus is? a.∀xIn(A,1,3)⇒Stench(A)∧¬Breeze(A)∧Pit(1,4)∧¬Wumpus(1,3) b. A(1,3)⇒¬W(1,3)∧¬S(1,3)∧B(1,3)∧P(1,4) c. P1,4∧S1,3∧A1,3∧¬B1,3 (numbers are subscript) d. R=1∧C=3⇒A∧S∧¬B∧P
c. P1,4∧S1,3∧A1,3∧¬B1,3
Which of the following is the best explanation why it's better to use 10 outputs than 4 for classifying 10 digits? a. Having 10 output united allows you to break down the input image into more subregions and take them into account separately b. Simpler isn't always better, Occam's razor notwithstanding c. There isn't an obvious mapping between the parts of an image and the relative numerical value of the digit d. More output units give more weights and that allows for higher accuracy.
c. There isn't an obvious mapping between the parts of an image and the relative numerical value of the digit
What approach does the relatively modern question answering system AskMSR share with the classic ELIZA system discussed on the first day of class? a. Their perofrmance depends directly on the size of their vocabularies b. They improve their accuracy with experience c. They both transform their inputs using fairly simple templates d. They are both based on propositional logic
c. They both transform their inputs using fairly simple templates
What is the general goal of regularization? a. To minimize the overall cost of making a prediction from a learned model b. To ensure that each test example has an appropriate classification c. To balance the complexity of a model with its accuracy d. To ensure that a system gives similar outputs for similar inputs
c. To balance the complexity of a model with its accuracy
given a crossword puzzle, which of the following would be the set of variables in a CSP? a. [1,2,3,4] b. the set of words from the word bank c. [1 across, 3 across, 4 across, 1 down, 2 down] d. The solution to the crossword puzzle
c. [1 across, 3 across, 4 across, 1 down, 2 down]
the blank space '1 across' in a crossword puzzle has 3 spaces for characters, what is the domain for 1 across? a.[ant, ape, big, bus, bard, book, ginger, symbol, syntax] b.[ant, ape] c.[ant, ape, big, bus, car , has] d.[big, bus, bard, book, buys, brown]
c.[ant, ape, big, bus, car , has]
Why is it a bad idea to use the overall accuracy (i.e. the percent of correct classifications) as a cost function? a. It doesn't tell you how well the network does on different digits b. The combination of weights that gives the lowest error may not be the same as those that give the highest accuracy. c. The calculation of accuracy is more difficult d. Small changes in the weights can lead to big chances in accuracy or no changes at all
d. Small changes in the weights can lead to big chances in accuracy or no changes at all
No one is held for murder is given bail. Smith isn't held for murder. So.. a. Smith is given bail b. Smith isn't given bail c. Smith is innocent d. none of these validly follows
d. none of these validly follows
Which one is the translation of "John has exactly one brother"? a. ∃x,y brother(John, x) ^ brother(John, y) ^ x = y b. ∃x brother(John, x) -> ∀y(brother(John, y) ^ x = y) c. ∃x brother(John, x) -> ∀y(brother(John, y) -> x = y) d. ∃x brother(John, x) ^ ∀y(brother(John, y) -> x = y) e. ∀x brother(John, x) -> ∃y(brother(John, y) ^ x = y)
d. ∃x brother(John, x) ^ ∀y(brother(John, y) -> x = y)
How do you check if a heuristic is admissible?
if it overestimates the path cost