Introduction to advanced learning algorithms in artificial neural networks

¡Supera tus tareas y exámenes ahora con Quizwiz!

What are some features of convolutional neural networks?

'Convolution': uses back propagation, but makes all patches of weights identical (average weight changes across corresponding connections in each patch), each sheet of neurons codes for the same "feature" independent of location in the image. 'Pooling': takes the max or average output of local units (smoothing/ shrinking representation) 'Dropout': sets randomly set neurons = 0 in 50% of trials during training, use all x0.5 during testing - learns more robust sets of connections, reduces 'overfitting' Threshold linear transfer function: f(h) = h if h>0, f(h) = 0 if h<0. Speeds up learning. 'Augmented dataset': 2048 translations and reflections; random noise in pixel values on each cycle -improves 'generalisation'.

What is a general problem of reinforcement learning?

To develop a 'critic' network that learns to provide feedback at each time-step even though the environment provides feedback rarely (e.g. when the tree is reached), i.e. how do you know which outputs were good or bad: the problem of 'temporal credit assignment'. This involves developing an idea of the total expected future reward, and how this changes after making an action (i.e. if it changes for the better or for the worse).

What are some problems with the 'generalised delta rule' or 'error back-propagation'?

•Local minima in error: you can't be sure it will find the best set of weights -maybe just finds those that can't be improved by any small change. •Learning and generalisation depends on the choice of architecture •Not biologically plausible since the connection weights are changed using non-local information However, with enough neurons and enough training data (and recent increases in processing power), these networks outperform all other methods in some tasks.

What is the activation function?

In computational networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard computer chip circuit can be seen as a digital network of activation functions that can be "ON" (1) or "OFF" (0), depending on input. This is similar to the behavior of the linear perceptron in neural networks. However, it is the nonlinear activation function that allows such networks to compute nontrivial problems using only a small number of nodes. In artificial neural networks this function is also called transfer function (not to be confused with a linear system's transfer function).

What is the solution to the temporal credit assignment problem?

'Temporal difference' or 'sequential reinforcement' solution If reinforcement r(t)is intermittent (e.g. only when goal is reached), a 'critic' learns an 'evaluation function': the valuevof each state (or input) u is the expected future reward from that state (given how actions are usually made). The change in value v(t+1)-v(t)+ any reward r(t)is used to evaluate any action o(t) so output weights can be modified as in Arp(using δ(t) = v(t+1)-v(t)+r(t): ifδ(t)>0, action o(t) was good). But how to learn v? A simple learning rule creates a set of connection weights wso that v(t) = w.u(t). This is: w→w+ εδ(t)u(t), i.e. you can also use δ to learn weights for the critic! ---------------- Useδ(t) = v(t+1)-v(t)+r(t) to evaluate actions Use v(t) = w.u(t)where w→ w+ εδ(t)u(t) to learn value function.

What type of discrimination do perceptrons perform and what problems can't they solve?

-Linear Discrimination -Can't solve 'non-linearly separable' problems

What is the motivation behind the development of the backpropagation algorithm?

The goal of any supervised learning algorithm is to find a function that best maps a set of inputs to its correct output. An example would be a classification task, where the input is an image of an animal, and the correct output would be the name of the animal. The goal and motivation for developing the backpropagation algorithm was to find a way to train a multi-layered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output.

What is a multilayer perceptron (MLP)?

A feedforward artificial neural network model that maps sets of input data onto a set of appropriate outputs. It consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called back propagation for training the network. It is a modification of the standard linear perceptron and can distinguish data that are not linearly separable.

What is temporal difference learning based on?

A slow 'trial and error' based method. Initial actions will be random (takes a long time to get to goal), actions improve as a result of learning in the Arp action network (but a random element is required to enable ongoing improvement: i.e. finding new shorter routes).

What is an autoencoder?

An autoencoder, autoassociator or Diabolo network is an artificial neural network used for unsupervised learning of efficient codings. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. Architecturally, the simplest form of an autoencoder is a feedforward, non-recurrent neural net which is very similar to the multilayer perceptron (MLP), with an input layer, an output layer and one or more hidden layers connecting them. The differences between autoencoders and MLPs, though, are that in an autoencoder, the output layer has the same number of nodes as the input layer, and that, instead of being trained to predict the target value Y given inputs X, autoencoders are trained to reconstruct their own inputs X. Therefore, autoencoders are unsupervised learning models.

What is back propagation?

Backpropagation, an abbreviation for "backward propagation of errors", is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of a loss function with respect to all the weights in the network, so that the gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the loss function. Backpropagation requires a known, desired output for each input value in order to calculate the loss function gradient. It is therefore usually considered to be a supervised learning method, although it is also used in some unsupervised networks such as autoencoders. It is a generalization of the delta rule to multi-layered feedforward networks, made possible by using the chain rule to iteratively compute gradients for each layer. Backpropagation requires that the activation function used by the artificial neurons (or "nodes") be differentiable.

What does the critic do?

Critic provides single reward/penalty signal for each output. It provides feedback

What is the problem of temporal credit assignment?

E.g A network designed to play chess would receive a reinforcement signal (win or lose) after a long sequence of moves. The question that arises is: How do we assign credit or blame individually to each move in a sequence that leads to an eventual victory or loss? This is called the temporal credit assignment problem in contrast with the structural credit problem where we must attribute network error to different weights.

What are evolutionary algorithms?

In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions (see also loss function). Evolution of the population then takes place after the repeated application of the above operators. Simplest model: •'Genetic' code determines connection weights •Simulate behaviour of several networks with random weights over long period, •'Reproduce' most successful networks with small random changes to weights •Repeat for many 'generations'. More complex models: •Include numbers & locations neurons, rules for development in genetic code. •'Sexual reproduction of codes. •Allow for non-transferable weight modification (learning) during life..

What is a convolutional neural network (CNN)?

In machine learning, a convolutional neural network is a type of feed-forward artificial neural network in which the connectivity pattern between its neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli in a restricted region of space known as the receptive field. The receptive fields of different neurons partially overlap such that they tile the visual field. The response of an individual neuron to stimuli within its receptive field can be approximated mathematically by a convolution operation. Convolutional networks were inspired by biological processes and are variations of multilayer perceptrons designed to use minimal amounts of preprocessing. They have wide applications in image and video recognition, recommender systems and natural language processing.

What is the delta rule?

In machine learning, the delta rule is a gradient descent learning rule for updating the weights of the inputs to artificial neurons in a single-layer neural network.

What is a perceptron?

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers (functions that can decide whether an input, represented by a vector of numbers, belongs to some specific class or not).

What is reinforcement learning?

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible. Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.


Conjuntos de estudio relacionados

Second Story of Creation Sheet Genesis 2:4-10

View Set

FRENCH 7 - DIRECT OBJECT PRONOUNS #2

View Set