TDDC17
Turing Test
The Turing Test is a test of a machine's abilities to exhibit intelligent behavior, such that the behavior is indistinguishable from or equivalent to the behaviour of a human. The test consists of three actors: a machine, a human and an interrogator. The machine and the human replies, in some typewritten language(so that voice, writing styles, etc does not give the machine away), to questions asked by the interrogator, and the interrogator tries, using the replies, to distinguish the machine from the human. If the interrogator fails to tell the machine from the human, the machine is considered to have passed the Turing Test.
Physical symbol system (PSS)
A physical symbol system takes physical patterns (symbols), combining them into structures (expressions) and manipulates them (using processes) to produce new expressions
Difference between supervised and reinforcement learning
Difference: Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Instead the focus is on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) and trying to maximize the reward.
goal based agent
A goal based agent uses goals to describe desirable situations to be in. These goals are combined with the current state description to take actions that achieve a goal. It is similar to the model based agents, but also uses goals.
Nonmonotonic logic:
Assumes that the world is the same until proven other. Draws tentative conclusions, enabling reasoners to retract their conclusion(s) based on further evidence.
Explain the backpropagation algorithm
Can be used for training neural networks. The motivation for backpropagation is to train a multi-layered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output. Basically, backpropagation calculate the error contribution of each neuron after a batch of data (e.g. in image recognition, multiple images) is processed.
Explain one way in which landmarks can be used to define a heuristic function h(s) in a planning problem P
LAMA (counts landmarks): Identifies a set of landmarks that still need to be achieved after reaching state s through path (action sequence) π. L(s,π) = (L \ Accepted(s,π)) U ReqAgain(s,π), = All discovered landmarks, minus those that are accepted as achieved (has become true after predecessors are achieved) plus those that we can show will have to be re-achieved.
The physical symbol system hypothesis
"A physical symbol system has the necessary and sufficient means for general intelligent action". * necessary - any system exhibiting intelligence will prove upon analysis to be a physical symbol system * sufficient - any physical symbol system of sufficient size can be organized further to exhibit general intelligence
How does the deep learning algorithm work?
* use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. * learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners. *learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. *use some form of gradient descent for training via backpropagation.
Important concepts in PSS
- Designation: An expression designates an object if given the expression, the PSS can affect or behave depending on the object. - Interpretation: A PSS can interpret an expression if, given the expression it can perform the corresponding process. (The expression can also describe processes,not only describe objects)
Provide a definition of unit clauses in DPLL
A clause has one literal, in DPLL a clause can also be when all literals are assigned false but one
What is landmark for state s in planning problem P
A landmark is something you must pass by/through in every solution to a specific planning problem. (An action landmark could be viewed as an action that must be executed in order to make a complete plan.) A fact landmark for state s in planning problem P is a fact (a literal) that is not true in s but must be true at some point in every solution to P starting in state s. (Action landmark = action, fact landmark = literal)
Explain how a loss function is used during training
A loss function is used to measure the accuracy of our hypothesis approximation h(x) to the actual true function f(x) for a training set. The loss function expresses the amount of utility lost by predicting some function h(x) when the correct answer is f(x). An example of a cost function could be summing the squared difference of our prediction and the true output (for every training example). We usually want to minimize the cost function in order to get a prediction as close to the true function as possible.
model based agent
A model based agent keeps a stored model of the world that helps explain how the world works. The agent thus bases its actions on both current and previous percepts.
Explain the underlying pattern database heuristics by explaining patterns, how they are used to define subproblems and how h(s) is is defined and computed
A pattern: a partial specification of a permutation (or a state). In Puzzle example - the tiles occupying certain locations. To compute the admissible heuristic h(s) we solve P´(s) optimally and compute the cost which leads us to the admissible heuristic h(s). Could be done by ignoring variables/ facts that are "less important" and then compute the cost. By ignoring variables/facts we relax the problem.
Reflex agent
A reflex agent is an agent that only selects actions based on the current percept, ignoring history.
Information gain:
A selection strategy in decision tree learning. It is used to rapidly narrow down the state of a variable X by defining a sequence of attributes. Information gain is defined in terms of entropy. Entropy(denoted H) is the measure of uncertainty of a random variable. A random variable with only one value - a coin that always comes heads - has no uncertainty and hence its entropy is defined as zero. Information gain can be viewed as the reduction in entropy we get by observing the value of a random variable. In the case with the coin, since it has zero bits entropy we gain no information by observing its value. Mathematically, the information gain from the attribute test on an attribute A is denoted as the entropy of the goal attribute minus the expected entropy remaining after testing attribute A.
Agent Agent function Agent Program
An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. An agent function maps any given percept sequence to an action in mathematical terms. While the agent function is an abstract mathematical description, the agent program is a physical implementation of the agent function running within some physical system.
Minimum remaining value heuristic
Attempts to fail early to be able to remove parts of the search tree. This is done by choosing the variable with fewest remaining legal values. By this big parts of the search tree will be knocked out contributing to a faster search.
Autonomy
Autonomy is when the agent relies on its own percepts rather than prior knowledge of the designer.
Least constraining value heuristic
Chooses the value that rule out the fewest choices of values for the neighboring variables(maximize their options). "Fail last heuristic". Only suitable when searching for one solution.
What is deep learning?
Deep learning is a machine learning model which uses learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
What can deep learning be used for? Give examples of networks that are commonly used
Deep learning, in particular CNNs (Convolutional Neural Network) and RNNs (Recurrent Neural Network), often result in very large networks with very large ("Big Data") training sets - millions of examples. Used for speech recognition, image recognition, natural language processing etc.
Describe two ways in which relaxation can cahgne the state space of a problem in order to preserve all solutions but also introduce new solutions
Example problem: Hanoi Tower 1. New arcs, i.e. connections in the form of actions, can be added between states in the state spaces. This will preserve all existing solutions and add more solutions at the same time. ex. Add function buildTower(C,B,A) 2. The goal function of the heuristic function can be modified so that a larger number of solutions will be accepted while the old solutions are still viable. ex. Old goal (and (on B A)(on C B)) New goal (and (on B A) (or (on C B)(on A C)))
Explain why and when exploration is needed and outline an example algorithm
Exploration means that an action which does not create highest utility is chosen. This to avoid being stuck at local maximas/minimas. An example algorithm would be to at 1% of the movement make a totally random action instead of the one creating maximum utility. An example algorithm that uses exploration is Q-Learning.
Does a PSS fulfil the necessary and sufficient conditions for intelligent behaviour
I think that the PSS hypothesis is very strong. Characteristics of intelligence are: - Adapting to the environment, which is ensured but also requires the designation. - Act, which is of course possible only if the system is physical - Learn: The capacity of learning requires the ability to modify the representations (modify expressions thanks to processes) but even the processes themselves. Which is possible as far as processes are derived from interpreted expressions.
Explain overfitting and a princpled way to detect it
If a statistical model is overfit, it describes random error or noise instead of the actual underlying relationship of the data. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations - "over trained". It could also occur when trying to fit a linear model to a set of non-linear data. Can be detected by determening if the model fits the new data as well as it did the data used to estimate the model. A simple approach is to use a separate validation set with examples that is only used for evaluating models of different complexity. This is called a hold-out validation set as we keep the data away from the training phase
What happens if gamma is reduced? (Q-learning)
If we were to reduce the discount factor *gamma*, the agent would become more greedy and short-sighted.
Relate curse of dimensionality to Q-learning
In the context of Q-learning, the curse of dimensionality addresses the fact that the algorithm need to discretize continuous states and action spaces. As theses spaces grow, the Q-learning table will grow exponentially with their dimension.
Do you belive Turing Test is an adequate test?
Motivation for no: The test itself does only determine whether a machine behaves like a human or not, this fails to measure intelligence in two ways: 1. Some human behavior is not intelligent. In the Turing test an example of this would be that it could be beneficial for the machine to have errors in spelling since it makes it more human. 2. Some intelligent behavior is not human. E.g it is not possible to evaluate system more intelligent than humans.
Explain Neural networks
Neural networks are computing systems (a class of machine learning algorithms) inspired by the biological neural networks that constitute animal brains. Learns by considering examples (usually doesn't require task-specific programming).
Assume you are training a classic fully-connected feed-forward neural network with p parameters. Using the backprogagation algorithm to compute loss gradients, what is the computational complexity per example?
O(p)
Learning agent
Performance element = agent in the other cases. Also has critic, learning element and problem generation elements to make the agent able to learn from previous actions and a performance standard given to the agent.
Performance measure
Performance measure is the notion of desirability that evaluates any given sequence of environment states. In other words, it provides a measurement of how well the agent performs.
Explain the terms in the Q-learning algortihm, Q(), R(), a, alpha, gamma
Q(s,a) = Is a mapping between the expected utility associated with executing an action a in a given state s. R(s) = Is the reward of reaching a state s. a - is the action (output) alpha: is the learning rate (0-1) if equals 1 it is optimal and the agent completely overwrites its learnt data with new data . gamma = is a number between 0 and 1 called the discount factor and trades off the importance of sooner versus later rewards.
Rationality
Rationality can be expressed as making the best choice given the percept sequence of the environment. A rational agent selects the action that maximize performance based on its percept sequence, actions and previous knowledge of the environment.
Reinforcement learning
Reinforcement learning - The agent knows the current state and reward at each step. The performance metric is the sum of rewards over time. Has to plan a sequence of actions for good performance. The agent learns by itself, no one needs to provide the agent with correct examples.
Satisficing planning. Does it requir eadmissable heuristics? Why/Why not?
Satisficing planning means finding a plan that is sufficiently good, sufficiently quickly. It can be viewed as using a plan that will get us to the goal but does not guarantee that we will find the optimal solution, since generating the perfect plan might be to computationally expensive to be worth it. Since the algorithm will be fine just finding a solution (any solution will do) the heuristic function does not need to be admissible.
What is the main idea behind h1 that allows it to avoid combinatorial explosions and to be computed far more quickly than h+? What is the difference between the two very similar heuristic functions h1 and hadd?
Since h1 is only interested in finding the maximal heuristic value, it doesn't have to solve the relaxed problem optimally. "h1 and hadd work in very similar ways. The two algorithms work by considering a state in the planning graph, and using all applicable actions to make every atom in that state true. The cost of the actions required to make all atoms true is the basis of the heuristic value they produce. For hadd, the heuristic for a given state is the combined cost of achieving every atom in that state.For h1, the heuristic for a given state is the cost of the most expensive atom in that state.Note that neither algorithm actually solves the relaxed problem, they just compute an estimate of how difficult a given state would be to achieve, relative to the current state. h1 is admissible, whereas hadd is not."
State three distinct and important reasons why planning should be automated
Some reasons for automated planning (need more examples here...): -Manual planning can be boring and inefficient -It may create higher quality plans -Humans will not be able to deduct as viable and good solutions as computers -Can be applied to domains where the agent is (and humans may not be) -Humans make mistakes Computers work faster than humans
What is a suitable loss function given a supervised learning problem from examples (x,y) where the output y belong to the real numbers?
Something that gives us an idea of the deviation from the prediction. For example Least square method.
Supervised learning
Supervised learning - Task of inferring a function from labeled training data. The training data consist of a set of training examples.Learns from examples of correct behavior, learn an unknown function f(x)=y given examples of (x,y), performance metric is the difference between the function and the correct examples. Where performance loss is the difference between learnt function and correct examples.
Supervised learning (formal explanation)
Supervised learning is the task of inferring a function from labeled training data. Given a training set of N example input-output pairs, where each output yj was mapped by an unknown function yi = f(xi), supervised learning tries to discover a function h that approximates this unknown function f. The function h can be viewed as a hypothesis of what we think f looks like. The performance metric of supervised learning is the difference between f and h for every pair (xj , y,), that is part of the predictions/the test set.
Explain the curse of dimensionality in the context of reinforcement learning
The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance. In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high dimensional data, however, all objects appear to be sparse and dissimilar in many ways, which prevents common data organization strategies from being efficient.
What is the q-learning formula used for? what does gamma do?
The formula is the Q-learning formula. gamma is a number between 0 and 1 called the discount factor and trades off the importance of sooner versus later rewards. Gamma may also be interpreted as the likelihood to succeed (or survive) at every step gamma can be said to factor how much we care about future utility values (max 1). A gamma of 0 would mean the agent is hence very shortsighted. A value close to one means that it will strive for long time goals
What is the purpose of alpha (Q-learning)
The purpose of *alpha* is to provide a some sort of compromise between learning by exploration and relying on prior knowledge of the utility of a certain action-state pair. If the agent is operating in a previously unknown environment, alpha should optimally decrease while the agent is exploring the environment
The Heuristic Search Hypothesis
The solutions to problems are represented as symbol structures. A physical symbol system exercises its intelligence in problem solving by search--that is, by generating and progressively modifying symbol structures until it produces a solution structure.
When resolving flaws in a plan, one typically distinguishes between two specific types of flaws. Which are these and how can they be resolved?
There exists two types of flaws: i) Open goals to be achieved. An open goal is when a precondition to a goal or an action does not have an incoming causal link. Open goals are resolved either by adding a causal link from an action that already achieves the open goal, e.g. (where the action clear(A) already achieves the goal action clear(A)) or by adding a new action with the open goal as an effect, e.g. (where stack(A,B) achieves the goal action of on(A,B). ii) Threat to be resolved. A threat is when a possible action would invalidate the precondition to another planned action. A threat can be resolved in two ways Alternative 1: This can be done by placing the action that disturbs the precondition is placed after the action that has the precondition. Alternative 2: The action that disturbs the precondition is placed before the action that supports the precondition.(Both options are only possible if the resulting partial order is consistent)
Degree Heuristic
This heuristic on the other hand starts with the variable that is involved with most constraints on other unassigned variables first. "hard cases first". Attempts to reduce the branching factor in the search tree.
Forward checking technique
Used to ensure local consistency. Looks at how other variables are affected by a binding by removing values from value domain. For example for a variable X that is connected to a variable Y through an arc, if X is assigned a value from the given domain then this value has to be eliminated from Ys domain (as well as the domain of other variables connected to X) to ensure consistency. If a value domain gets empty, backtrack and try again with other values.
Relaxation:
We know that P' is a relaxation of P iff C*-P´<= C*-P or We know that P' is a relaxation of P iff(if and only if) any solution to P also is a valid solution to P'.
Explain the significance of the backpropagation algortihm for training neural networks
When calculating the error between desired output and actual output in neural networks, it is "easy" to compute this for the output layer (since we know (according to the training set) what we want and what we get, kind of). But for the hidden layer, the error at these could seem mysterious since the training data does not say what value the hidden nodes should have. Backpropagation is the process where we can propagate the error from the output layer to the hidden layers, using the overall error gradient. Basically, what we do is that we start calculating an "error"-value for the output units, using the observed error. Then we, starting with the output layer propagate the "error"-values back to the previous layer, with the idea of "hey, node j is 'responsible' for some fraction of this error-value in each of the output nodes to which it connects". The "error"-values are divided on the nodes in the previous layer according to the strength of the connection between the hidden node and the output node and are propagated into this hidden layer to provide that layers "error"-values, followed by an update of the weights between the two layers (based on this propagation). This process then repeats to the previous layer, and so on until we have reached the earliest hidden layer.
The frame problem
Will the world change when actions are taken? Most features in the world do not change but not always, so how can this rule be represented logically? E.g. Is the gold still at 2,3 after the agent moves to 1,2 ?
The qualification problem:
all actions have exceptions, so how can actions be represented so that they work most of the time but exceptions can still occur. Can we add this to the theory without having to change action rules all the time? Real example: The Wumpus breaks the rules and goes on a rampage, killing the poor adventurer
Why can deep layer networks perform better than shallow ones?
f you have a single layer, the number of neurons you need, to match a multi-layer network, is combinatorially large. With multiple layers, each layer can create more and more abstract features/concepts. Like, in our brain, the output from our eyes enters the brain at the back, and passes through very low-level feature detectors. Lines and such. As the signal moves forward through the layers, the features become more abstract, from simple line detectors to various types of moving object and so on. So it is with nets. Depth gives the possibility of abstracting relatively abstract concepts, in the upper layers, which massively improves the ability of the network to classify and so on. As long as there is sufficient data.
Crucial difference between h_1(n) and h_add(n). Are they inadmissable or not, why? Which provides more info, and why?
h1(n) - The maximum of the heuristic values for all possible solutions. Admissible as it is an underestimation of the actual cost. Guarantees an optimal solution as it chooses the maximum heuristic value. h_add(n) - The sum of all heuristic functions in h1(n). Is not guaranteed to find the optimal solution. Might overestimate the cost to reach the goal state and is therefore not admissible. Provides more information than h1 as it is the accumulated values from h1.
Difference nonmonotonic & classic(monotonic) logic
nonmonotonic can learn by new info (penguin-case)
Ramification problem:
there are causal dependencies which become true when an action is executed, how can these be represented without making action specifications overy detailed. Basically concerns the indirect consequences of an action