Artificial Intelligence

Ace your homework & exams now with Quizwiz!

Transition Model

A description of what state results from performing any applicable action in any state. More precisely, the transition model can be defined as a function. Upon receiving state s and action a as input, Results(s, a) returns the state resulting from performing action a in state s . For example, given a certain configuration of a 15 puzzle (state s ), moving a square in any direction (action a ) will bring to a new configuration of the puzzle (the new state).

depth- first search

A depth-first search algorithm exhausts each one direction before trying another direction. In thesecases, the frontier is managed as a stack data structure. The catchphrase you need to rememberhere is "last-in first-out." After nodes are being added to the frontier, the first node to remove andconsider is the last one to be added. This results in a search algorithm that goes as deep aspossible in the first direction that gets in its way while leaving all other directions for later. (An example from outside lecture: Take a situation where you are looking for your keys. In a depthfirst search approach, if you choose to start with searching ig in your pants, you'd first go throughevery single pocket, emptying each pocket and going through the contents carefully. You will stopsearching in your pants and start searching elsewhere only once you will have completelyexhausted the search in every single pocket of your pants.)

depth- first search Cons

1-It is possible that the found solution is not optimal. 2-At worst, this algorithm will explore every possible path before finding the solution, thus taking the longest possible time before reaching the solution.

For A* search to be optimal, the heuristic function, h(n), should be:

1. Admissible, or never overestimating the true cost, and 2. Consistent, which means that the estimated path cost to the goal of a new node in addition to the cost of transitioning to it from the previous node is greater or equal to the estimated path cost to the goal of the previous node. To put it in an equation form, h(n) is consistent if for every node n and successor node n' with step cost c, h(n) ≤ h(n') + c.

Bayesian Networks

A Bayesian network is a data structure that represents the dependencies among random variables. Bayesian networks have the following properties: • They are directed graphs. • Each node on the graph represent a random variable. • An arrow from X to Y represents that X is a parent of Y. That is, the probability distribution of Y depends on the value of X. • Each node X has probability distribution P(X | Parents(X)).

Markov Chain

A Markov chain is a sequence of random variables where the distribution of each variable follows the Markov assumption. That is, each event in the chain occurs based on the probability of the event before it. To start constructing a Markov chain, we need a transition model that will specify the the probability distributions of the next event based on the possible values of the current event.

Biconditional Elimination

A biconditional proposition is equivalent to an implication and its inverse with an And connective. For example, "It is raining if and only if Harry is inside" is equivalent to ("If it is raining, Harry is inside" And "If Harry is inside, it is raining").

State

A configuration of an agent in its environment. For example, in a 15 puzzle (https://en.wikipedia.org/wiki/15_puzzle), a state is any one way that all the numbers are arranged on the board

A* search

A development of the greedy best-first algorithm, A* search considers not only h(n), the estimated cost from the current location to the goal, but also g(n), the cost that was accrued until the current location. By combining both these values, the algorithm has a more accurate way of determining the cost of the solution and optimizing its choices on the go. The algorithm keeps track of (cost of path until now + estimated cost to the goal), and once it exceeds the estimated cost of some previous option, the algorithm will ditch the current path and go back to the previous option, thus preventing itself from going down a long, inefficient path that h(n) erroneously marked as best. Yet again, since this algorithm, too, relies on a heuristic, it is as good as the heuristic that it employs. It is possible that in some situations it will be less efficient than greedy best-first search or even the uninformed algorithms.

A hidden Markov model

A hidden Markov model is a type of a Markov model for a system with hidden states that generate some observed event. This means that sometimes, the AI has some measurement of the world but no access to the precise state of the world. In these cases, the state of the world is called the hidden state and whatever data the AI has access to are the observations. Here are a few examples for this: • For a robot exploring uncharted territory, the hidden state is its position, and the observation is the data recorded by the robot's sensors. • In speech recognition, the hidden state is the words that were spoken, and the observation is the audio waveforms. • When measuring user engagement on websites, the hidden state is how engaged the user is, and the observation is the website or app analytics.

Path Cost

A numerical cost associated with a given path. For example, a navigator app does not simply bring you to your goal; it does so while minimizing the path cost, finding the fastest way possible for you to get to your goal state.

Neural Networks

A program structure inspired by the human brain that is able to perform tasks effectively.

Double Negation Elimination

A proposition that is negated twice is true. For example, consider the proposition "It is not true that Harry did not pass the test". We can parse it the following way: "It is not true that (Harry did not pass the test)", or "¬(Harry did not pass the test)", and, finally "¬(¬(Harry passed the test))." The two negations cancel each other, marking the proposition "Harry passed the test" as true.

Distributive Property

A proposition with two elements that are grouped with And or Or connectives can be distributed, or broken down into, smaller units consisting of And and Or.

Random Variables

A random variable is a variable in probability theory with a domain of possible values that it can take on. For example, to represent possible outcomes when rolling a die, we can define a random variable Roll, that can take on the values {0, 1, 2, 3, 4, 5, 6}. To represent the status of a flight, we can define a variable Flight that takes on the values {on time, delayed, canceled}. Often, we are interested in the probability with which each value occurs. We represent this using a probability distribution.

Sentence

A sentence is an assertion about the world in a knowledge representation language. A sentence is how AI stores knowledge and uses it to infer new information.

Solution

A sequence of actions that leads from the initial state to the goal state.

Optimal Solution

A solution that has the lowest path cost among all solutions.

In a search process, data is often stored in a node, a data structure that contains the following data:

A state Its parent node, through which the current node was generated The action that was applied to the state of the parent to get to the current node The path cost from the initial state to this node

Minimax

A type of algorithm in adversarial search, Minimax represents winning conditions as (-1) for one side and (+1) for the other side. Further actions will be driven by these conditions, with the minimizing side trying to get the lowest score, and the maximizer trying to get the highest score.

Alpha-Beta Pruning

A way to optimize Minimax, Alpha-Beta Pruning skips some of the recursive computations that are decidedly unfavorable. After establishing the value of one action, if there is initial evidence that the following action can bring the opponent to get to a better score than the already established action, there is no need to further investigate this action because it will decidedly be less favorable than the previously established one. This is most easily shown with an example: a maximizing player knows that, at the next step, the minimizing player will try to achieve the lowest score. Suppose the maximizing player has three possible actions, and the first one is valued at 4. Then the player starts generating the value for the next action. To do this, the player generates the values of the minimizer's actions if the current player makes this action, knowing that the minimizer will choose the lowest one. However, before finishing the computation for all the possible actions of the minimizer, the player sees that one of the options has a value of three. This means that there is no reason to keep on exploring the other possible actions for the minimizing player. The value of the not-yet-valued action doesn't matter, be it 10 or (-10). If the value is 10, the minimizer will choose the lowest option, 3, which is already worse than the preestablished 4. If the not-yet-valued action would turn out to be (-10), the minimizer will this option, (-10), which is even more unfavorable to the maximizer. Therefore, computing additional possible actions for the minimizer at this point is irrelevant to the maximizer, because the maximizing player already has an unequivocally better choice whose value is 4.

Agent

An entity that perceives its environment and acts upon that environment. In a navigator app, for example, the agent would be a representation of a car that needs to decide on which actions to take to arrive at the destination.

Implication Elimination

An implication is equivalent to an Or relation between the negated antecedent and the consequent. As an example, the proposition "If it is raining, Harry is inside" is equivalent to the proposition "(it is not raining) or (Harry is inside)."

Artificial Intelligence

Artificial Intelligence (AI) covers a range of techniques that appear as sentient behavior by the computer.

Local and Global Minima and Maxima

As mentioned above, a hill climbing algorithm can get stuck in local maxima or minima.

depth- first search Pros

At best, this algorithm is the fastest. If it "lucks out" and always chooses the right path to the solution (by chance), then depth-first search takes the least possible time to get to a solution.

factoring

At this point, we can run an inference algorithm on the conjunctive normal form. Occasionally, through the process of inference by resolution, we might end up in cases where a clause contains the same literal twice. In these cases, a process called factoring is used, where the duplicate literal is removed. For example, (P ∨ Q ∨ S) ∧ (¬P ∨ R ∨ S) allow us to infer by resolution that (Q ∨ S ∨ R ∨ S). The duplicate S can be removed to give us (Q ∨ R ∨ S)

Actions

Choices that can be made in a state. More precisely, actions can be defined as a function. Upon receiving state s as input, Actions(s) returns as output the set of actions that can be executed in state s . For example, in a 15 puzzle, the actions of a given state are the ways you can slide squares in the current configuration (4 if the empty square is in the middle, 3 if next to a side, 2 if in the corner).

Resolution relies on

Complementary Literals, two of the same atomic propositions where one is negated and the other is not, such as P and ¬P

Conditional Probability

Conditional probability is the degree of belief in a proposition given some evidence that has already been revealed. As discussed in the introduction, AI can use partial information to make educated guesses about the future. To use this information, which affects the probability that the event occurs in the future, we rely on conditional probability. Conditional probability is expressed using the following notation: P(a | b), meaning "the probability of event a occurring given that we know event b to have occurred," or, more succinctly, "the probability of a given b." Now we can ask questions like what is the probability of rain today given that it rained yesterday P(rain today | rain yesterday), or what is the probability of the patient having the disease given their test results P(disease | test results).

Bayes' Rule

Bayes' rule is commonly used in probability theory to compute conditional probability. In words, Bayes' rule says that the probability of b given a is equal to the probability of a given b, times the probability of b, divided by the probability of a

Greedy best-first search

Breadth-first and depth-first are both uninformed search algorithms. That is, these algorithms do not utilize any knowledge about the problem that they did not acquire through their own exploration. However, most often is the case that some knowledge about the problem is, in fact, available. For example, when a human maze-solver enters a junction, the human can see which way goes in the general direction of the solution and which way does not. AI can do the same. A type of algorithm that considers additional knowledge to try to improve its performance is called an informed search algorithm Greedy best-first search expands the node that is the closest to the goal, as determined by a heuristic function h(n). As its name suggests, the function estimates how close to the goal the next node is, but it can be mistaken. The efficiency of the greedy best-first algorithm depends on how good the heuristic function is. For example, in a maze, an algorithm can use a heuristic function that relies on the Manhattan distance between the possible nodes and the end of the maze. The Manhattan distance ignores walls and counts how many steps up, down, or to the sides it would take to get from one location to the goal location. This is an easy estimation that can be derived based on the (x, y) coordinates of the current location and the goal location.

Uncertainty

Dealing with uncertain events using probability. -Last lecture, we discussed how AI can represent and derive new knowledge. However, often, in reality, the AI has only partial knowledge of the world, leaving space for uncertainty. Still, we would like our AI to make the best possible decision in these situations. For example, when predicting weather, the AI has information about the weather today, but there is no way to predict with 100% accuracy the weather tomorrow. Still, we can do better than chance, and today's lecture is about how we can create AI that makes optimal decisions given limited information and uncertainty.

Hill Climbing Variants

Due to the limitations of Hill Climbing, multiple variants have been thought of to overcome the problem of being stuck in local minima and maxima. What all variations of the algorithm have in common is that, no matter the strategy, each one still has the potential of ending up in local minima and maxima and no means to continue optimizing. The algorithms below are phrased such that a higher value is better, but they also apply to cost functions, where the goal is to minimize cost.

Steps in Conversion of Propositions to Conjunctive Normal Form

Eliminate biconditionals Eliminate implications Move negation inwards until only literals are being negated (and not clauses), using De Morgan's Laws

Possible Worlds

Every possible situation can be thought of as a world, represented by the lowercase Greek letter omega ω. For example, rolling a die can result in six possible worlds: a world where the die yields a 1, a world where the die yields a 2, and so on. To represent the probability of a certain world, we write P(ω)

To run the Model Checking algorithm, the following information is needed:

Knowledge Base, which will be used to draw inferences A query, or the proposition that we are interested in whether it is entailed by the KB Symbols, a list of all the symbols (or atomic propositions) used (in our case, these are rain , hagrid , and dumbledore ) Model, an assignment of truth and false values to symbols

Knowledge engineering

Knowledge engineering is the process of figuring out how to represent propositions and logic in AI.

Existential Quantification

Existential quantification is an idea parallel to universal quantification. However, while universal quantification was used to create sentences that are true for all x, existential quantification is used to create sentences that are true for at least one x. It is expressed using the symbol 3. For example, the sentence 3x, House(x) A Belongs To(Minerva, x) means that there is at least one symbol that is both a house and that Minerva belongs to it. In other words, this expresses the idea that Minerva belongs to a house. Existential and universal quantification can be used in the same sentence. For example, the sentence Vx. Person(x) (By. House(y) A Belongs To(x, y)) expresses the idea that if x is a person, then there is at least one house, y, to which this person belongs. In other words, this sentence means that every person belongs to a house. There are other types of logic as well, and the commonality between them is that they all exist in pursuit of representing information. These are the systems we use to represent knowledge in our Al

Unconditional Probability

Unconditional probability is the degree of belief in a proposition in the absence of any other evidence. All the questions that we have asked so far were questions of unconditional probability, because the result of rolling a die is not dependent on previous events.

Search

Finding a solution to a problem, like a navigator app that finds the best route from your origin to the destination, or like playing a game and figuring out the next move. -Search problems involve an agent that is given an initial state and a goal state, and it returns a solution of how to get from the former to the latter. A navigator app uses a typical search process, where the agent (the thinking part of the program) receives as input your current location and your desired destination, and, based on a search algorithm, returns a suggested path. However, there are many other forms of search problems, like puzzles or mazes. -

Optimization

Finding not only a correct way to solve a problem, but a better—or the best—way to solve it. -Optimization is choosing the best option from a set of possible options. We have already encountered problems where we tried to find the best possible option, such as in the minimax algorithm, and today we will learn about tools that we can use to solve an even broader range of problems.

First order logic

First order logic is another type of logic that allows us to express more complex ideas more succinctly than propositional logic. First order logic uses two types of symbols: Constant Symbols and Predicate Symbols. Constant symbols represent objects, while predicate symbols are like relations or functions that take an argument and return a true or false value.

Artificial Intelligence example

For example, AI is used to recognize faces in photographs on your social media, beat the World's Champion in chess, and process your speech when you speak to Siri or Alexa on your phone.

Hill Climbing

Hill climbing is one type of a local search algorithm. In this algorithm, the neighbor states are compared to the current state, and if any of them is better, we change the current node from the current state to that neighbor state. What qualifies as better is defined by whether we use an objective function, preferring a higher value, or a decreasing function, preferring a lower value In this algorithm, we start with a current state. In some problems, we will know what the current state is, while, in others, we will have to start with selecting one randomly. Then, we repeat the following actions: we evaluate the neighbors, selecting the one with the best value. Then, we compare this neighbor's value to the current state's value. If the neighbor is better, we switch the current state to the neighbor state, and then repeat the process. The process ends when we compare the best neighbor to the current state, and the current state is better. Then, we return the current state.

Manhattan Distance

However, it is important to emphasize that, as with any heuristic, it can go wrong and lead the algorithm down a slower path than it would have gone otherwise. It is possible that an uninformed search algorithm will provide a better solution faster, but it is less likely to do so than an informed algorithm.

And Elimination

If an And proposition is true, then any one atomic proposition within it is true as well. For example, if we know that Harry is friends with Ron and Hermione, we can conclude that Harry is friends with Hermione.

Learning

Improving performance based on access to data and experience. For example, your email is able to distinguish spam from non-spam mail based on past experience.

Depth-First Search

In the previous description of the frontier, one thing went unmentioned. At stage 1 in the pseudocode above, which node should be removed? This choice has implications on the quality of the solution and how fast it is achieved. There are multiple ways to go about the question of which nodes should be considered first , two of which can e represented by the data structures of stack ( in depth- first search ) and queue ( in breadth- first search ).

Likelihood Weighting

In the sampling example above, we discarded the samples that did not match the evidence that we had. This is inefficient. One way to get around this is with likelihood weighting, using the following steps: • Start by fixing the values for evidence variables. • Sample the non-evidence variables using conditional probabilities in the Bayesian network. • Weight each sample by its likelihood: the probability of all the evidence occurring.

Independence

Independence is the knowledge that the occurrence of one event does not affect the probability of the other event. For example, when rolling two dice, the result of each die is independent from the other. Rolling a 4 with the first die does not influence the value of the second die that we roll. This is opposed to dependent events, like clouds in the morning and rain in the afternoon. If it is cloudy in the morning, it is more likely that it will rain in the morning, so these events are dependent. Independence can be defined mathematically: events a and b are independent if and only if the probability of a and b is equal to the probability of a times the probability of b: P(a ∧ b) = P(a)P(b).

Inference by Enumeration

Inference by enumeration is a process of finding the probability distribution of variable X given observed evidence e and some hidden variables Y.

Knowledge and Search Problems

Inference can be viewed as a search problem with the following properties: Initial state: starting knowledge base Actions: inference rules Transition model: new knowledge base after inference Goal test: checking whether the statement that we are trying to prove is in the KB Path cost function: the number of steps in the proof

Inference

Inference is the process of deriving new sentences from old ones. -Although backtracking search is more efficient than simple search, it still takes a lot of computational power. Enforcing arc consistency, on the other hand, is less resource intensive. By interleaving backtracking search with inference (enforcing arc consistency), we can get at a more efficient algorithm. This algorithm is called the Maintaining Arc-Consistency algorithm. This algorithm will enforce arc-consistency after every new assignment of the backtracking search. Specifically, after we make a new assignment to X, we will call the AC-3 algorithm and start it with a queue of all arcs (Y,X) where Y is a neighbor of X (and not a queue of all arcs in the problem). Following is a revised Backtrack algorithm that maintains arc-consistency, with the new additions in bold.

Inference rules

Inference rules allow us to generate new information based on existing knowledge without considering every possible model. Inference rules are usually represented using a horizontal bar that separates the top part, the premise, from the bottom part, the conclusion. The premise is whatever knowledge we have, and the conclusion is what knowledge can be generated based on the premise

De Morgan's Law

It is possible to turn an And connective into an Or connective. Consider the following proposition: "It is not true that both Harry and Ron passed the test." From this, it is possible to conclude that "It is not true that Harry passed the test" Or "It is not true that Ron passed the test." That is, for the And proposition earlier to be true, at least one of the propositions in the Or propositions must be true.

Joint Probability

Joint probability is the likelihood of multiple events all occurring. Let us consider the following example, concerning the probabilities of clouds in the morning and rain in the afternoon.

Local Search

Local search is a search algorithm that maintains a single node and searches by moving to a neighboring node. This type of algorithm is different from previous types of search that we saw. Whereas in maze solving, for example, we wanted to find the quickest way to the goal, local search is interested in finding the best answer to a question. Often, local search will bring to an answer that is not optimal but "good enough," conserving computational power. Consider the following example of a local search problem: we have four houses in set locations. We want to build two hospitals, such that we minimize the distance from each house to a hospital. This problem can be visualized as follows:

Logical Connectives

Logical connectives are logical symbols that connect propositional symbols in order to reason in a more complex way about the world.

There are multiple ways to infer new knowledge based on existing knowledge. First, we will consider the

Model Checking algorithm.

Probability Rules

Negation Inclusion-Exclusion Marginalization Conditioning

Nodes

Nodes contain information that makes them very useful for the purposes of search algorithms. They contain a state, which can be checked using the goal test to see if it is the final state. If it is, the node's path cost can be compared to other nodes' path costs, which allows choosing the optimal solution. Once the node is chosen, by virtue of storing the parent node and the action that led from the parent to the current node, it is possible to trace back every step of the way from the initial state to this node, and this sequence of actions is the solution. However, nodes are simply a data structure — they don't search, they hold information. To actually search, we use the frontier, the mechanism that "manages" the nodes. The frontier starts by containing an initial state and an empty set of explored items, and then repeats the following actions until a solution is reached:

Language

Processing natural language, which is produced and understood by humans.

Propositional logic

Propositional logic is based on propositions, statements about the world that can be either true or false, as in sentences 1-5 above.

Propositional Symbols

Propositional symbols are most often letters (P, Q, R) that are used to represent a proposition.

Universal Quantification

Quantification is a tool that can be used in first order logic to represent sentences without using a specific constant symbol. Universal quantification uses the symbol v to express "for all" So, for example, the sentence Vx. Belongs To(x, Gryffindor) Belongs To(x, Hufflepuff) expresses the idea that it is true for every symbol that if this symbol belongs to Gryffindor, it does not belong to Hufflepuff.

Inference has multiple properties.

Query X: the variable for which we want to compute the probability distribution. • Evidence variables E: one or more variables that have been observed for event e. For example, we might have observed that there is light rain, and this observation helps us compute the probability that the train is delayed. • Hidden variables Y: variables that aren't the query and also haven't been observed. For example, standing at the train station, we can observe whether there is rain, but we can't know if there is maintenance on the track further down the road. Thus, Maintenance would be a hidden variable in this situation. • The goal: calculate P(X | e). For example, compute the probability distribution of the Train variable (the query) based on the evidence e that we know there is light rain.

Knowledge

Representing information and drawing inferences from it. -Humans reason based on existing knowledge and draw conclusions. The concept of representing knowledge and drawing conclusions from it is also used in AI, and in this lecture we will explore how we can achieve this behavior.

Resolution

Resolution is a powerful inference rule that states that if one of two atomic propositions in an Or proposition is false, the other has to be true. For example, given the proposition "Ron is in the Great Hall" Or "Hermione is in the library", in addition to the proposition "Ron is not in the Great Hall," we can conclude that "Hermione is in the library.

Sampling

Sampling is one technique of approximate inference. In sampling, each variable is sampled for a value according to its probability distribution. We will start with an example from outside lecture, and then cover the example from lecture. To generate a distribution using sampling with a die, we can roll the die multiple times and record what value we got each time. Suppose we rolled the die 600 times. We count how many times we got 1, which is supposed to be roughly 100, and then repeat for the rest of the values, 2-6. Then, we divide each count by the total number of rolls. This will generate an approximate distribution of the values of rolling a die: on one hand, it is unlikely that we get the result that each value has a probability of 1/6 of occurring (which is the exact probability), but we will get a value that's close to it.

some of the ideas that make AI possible:

Search Knowledge Uncertainty Optimization Learning Neural Networks Language

The Markov Assumption

The Markov assumption is an assumption that the current state depends on only a finite fixed number of previous states. This is important to us. Think of the task of predicting weather. In theory, we could use all the data from the past year to predict tomorrow's weather. However, it is infeasible, both because of the computational power this would require and because there is probably no information about the conditional probability of tomorrow's weather based on the weather 365 days ago. Using the Markov assumption, we restrict our previous states (e.g. how many previous days we are going to consider when predicting tomorrow's weather), thereby making the task manageable. This means that we might get a more rough approximation of the probabilities of interest, but this is often good enough for our needs. Moreover, we can use a Markov model based on the information of the one last event (e.g. predicting tomorrow's weather based on today's weather).

Sensor Markov Assumption

The assumption that the evidence variable depends only on the corresponding state. For example, for our models, we assume that whether people bring umbrellas to the office depends only on the weather. This is not necessarily reflective of the complete truth, because, for example, more conscientious, rain-averse people might take an umbrella with them everywhere even when it is sunny, and if we knew everyone's personalities it would add more data to the model. However, the sensor Markov assumption ignores these data, assuming that only the hidden state affects the observation. A hidden Markov model can be represented in a Markov chain with two layers. The top layer, variable X, stands for the hidden state. The bottom layer, variable E, stands for the evidence, the observations that we have.

Goal Test

The condition that determines whether a given state is a goal state. For example, in a navigator app, the goal test would be whether the current location of the agent (the representation of the car) is at the destination. If it is — problem solved. If it's not — we continue searching.

Knowledge Base (KB)

The knowledge base is a set of sentences known by a knowledge-based agent. This is knowledge that the AI is provided about the world in the form of propositional logic sentences that can be used to make additional inferences about the world.

Model

The model is an assignment of a truth value to every proposition. To reiterate, propositions are statements about the world that can be either true or false. However, knowledge about the world is represented in the truth values of these propositions. The model is the truth-value assignment that provides information about the world. For example, if P: "It is raining." and Q: "It is Tuesday.", a model could be the following truth-value assignment: {P = True, Q = False}. This model means that it is raining, but it is not Tuesday. However, there are more possible models in this situation (for example, {P = True, Q = True}, where it is both raining an a Tuesday). In fact, the number of possible models is 2 to the power of the number of propositions. In this case, we had 2 propositions, so 2²=4 possible models.

breadth-first search (BFS)

The opposite of depth-first search would be breadth-first search (BFS). A breadth-first search algorithm will follow multiple directions at the same time, taking one step in each possible direction before taking the second step in each direction. In this case, the frontier is managed as a queue data structure. The catchphrase you need to remember here is "first-in firstout." In this case, all the new nodes add up in line, and nodes are being considered based on which one was added first (first come first served!). This results in a search algorithm that takes one step in each possible direction before taking a second step in any one direction. (An example from outside lecture: suppose you are in a situation where you are looking for your keys. In this case, if you start with your pants, you will look in your right pocket. After this, instead of looking at your left pocket, you will take a look in one drawer. Then on the table. And so on, in every location you can think of. Only after you will have exhausted all the locations will you go back to your pants and search in the next pocket.)

State Space

The set of all states reachable from the initial state by any sequence of actions. For example, Solving Search Problems In a search process, data is often stored in a node, a data structure that contains the following in a 15 puzzle, the state space consists of all the 16!/2 configurations on the board that can be reached from any initial state. The state space can be visualized as a directed graph with states, represented as nodes, and actions, represented as arrows between nodes.

Initial State

The state from which the search algorithm starts. In a navigator app, that would be the current location

Modus Ponens

The type of inference rule we use in this example is Modus Ponens, which is a fancy way of saying that if we know an implication and its antecedent to be true, then the consequent is true as well.

Depth-limited Minimax

There is a total of 255,168 possible Tic Tac Toe games, and 10²⁹⁰⁰⁰ possible games in Chess. The minimax algorithm, as presented so far, requires generating all hypothetical games from a certain point to the terminal condition. While computing all the Tic-Tac-Toe games doesn't pose a challenge for a modern computer, doing so with chess is currently impossible. Depth-limited Minimax considers only a pre-defined number of moves before it stops, without ever getting to a terminal state. However, this doesn't allow for getting a precise value for each action, since the end of the hypothetical games has not been reached. To deal with this problem, Depthlimited Minimax relies on an evaluation function that estimates the expected utility of the game from a given state, or, in other words, assigns values to states. For example, in a chess game, a utility function would take as input a current configuration of the board, try to assess its expected utility (based on what pieces each player has and their locations on the board), and then return a positive or a negative value that represents how favorable the board is for one player versus the other. These values can be used to decide on the right action, and the better the evaluation function, the better the Minimax algorithm that relies on it.

breadth-first search (BFS) Cons:

This algorithm is almost guaranteed to take longer than the minimal time to run. At worst, this algorithm takes the longest possible time to run.

breadth-first search (BFS) Pros:

This algorithm is guaranteed to find the optimal solution.

Probability

Uncertainty can be represented as a number of events and the likelihood, or probability, of each of them happening.

Stochastic:

choose randomly from higher-valued neighbors. Doing this, we choose to go to any direction that improves over our value. This makes sense if, for example, the highestvalued neighbor leads to a local maximum while another neighbor leads to a global maximum.

First-choice:

choose the first higher-valued neighbor

Steepest-ascent:

choose the highest-valued neighbor. This is the standard variation that we discussed above.

Note that the way local search algorithms work is by

considering one node in a current state, and then moving the node to one of the current state's neighbors. This is unlike the minimax algorithm, for example, where every single state in the state space was considered recursively.

A Clause is a

disjunction of literals (a propositional symbol or a negation of a propositional symbol, such as P, ¬P). A disjunction consists of propositions that are connected with an Or logical connective (P ∨ Q ∨ R). A conjunction, on the other hand, consists of propositions that are connected with an And logical connective (P ∧ Q ∧ R). Clauses allow us to convert any logical statement into a Conjunctive Normal Form (CNF), which is a conjunction of clauses, for example: (A ∨ B ∨ C) ∧ (D ∨ ¬E) ∧ (F ∨ G).

An Objective Function is a

function that we use to maximize the value of the solution.

A Cost Function is a

function that we use to minimize the cost of the solution (this is the function that we would use in our example with houses and hospitals. We want to minimize the distance from houses to hospitals).

Complementary literals allow us to

generate new sentences through inferences by resolution. Thus, inference algorithms locate complementary literals to generate new knowledge.

A local maximum (plural: maxima)

is a state that has a higher value than its neighboring states. As opposed to that, a global maximum is a state that has the highest value of all states in the state-space.

a local minimum (plural: minima)

is a state that has a lower value than its neighboring states. As opposed to that, a global minimum is a state that has the lowest value of all states in the state-space.

In a search process, data is often stored in a

node,

A Current State is the

state that is currently being considered by the function.

A Neighbor State is a

state that the current state can transition to. In the one-dimensional state-space landscape above, a neighbor state is the state to either side of the current state. In our example, a neighbor state could be the state resulting from moving one of the hospitals to any direction by one step. Neighbor states are usually similar to the current state, and, therefore, their values are close to the value of the current state.

adversarial search

the algorithm faces an opponent that tries to achieve the opposite goal. Often, AI that uses adversarial search is encountered in games, such as tic tac toe.

Markov Models

we have looked at questions of probability given some information that we observed. In this kind of paradigm, the dimension of time is not represented in any way. However, many tasks do rely on the dimension of time, such as prediction. To represent the variable of time we will create a new variable, X, and change it based on the event of interest, such that Xₜ is the current event, Xₜ₊₁ is the next event, and so on. To be able to predict events in the future, we will use Markov Models.

Based on hidden Markov models, multiple tasks can be achieved:

• Filtering: given observations from start until now, calculate the probability distribution for the current state. For example, given information on when people bring umbrellas form the start of time until today, we generate a probability distribution for whether it is raining today or not. • Prediction: given observations from start until now, calculate the probability distribution for a future state. • Smoothing: given observations from start until now, calculate the probability distribution for a past state. For example, calculating the probability of rain yesterday given that people brought umbrellas today. • Most likely explanation: given observations from start until now, calculate most likely sequence of events. The most likely explanation task can be used in processes such as voice recognition, where, based on multiple waveforms, the AI infers the most likely sequence of words or syllables that brought to these waveforms. Next is a Python implementation of a hidden Markov model that we will use for a most likely explanation task:


Related study sets

Psyc Chapter 10: Emotion and Motivation

View Set

Topic 6: Corporate accounts — working capital ratios

View Set

Chemistry Semester 2 Quiz 2.1.6 Acid and Base Equilibrium

View Set

APMA Module 2: Modern Portfolio Theory & Performance Evaluation of Equities

View Set