Neural Network

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Adaptive Learning Rates on Backpropagation - modification - advantage

- Backpropagation learning rate = consistent through training o can lead to slow training § modification · start with high learning rate and gradually reduce it (like Boltzmann machine) o create momentum to push weight faster to reach minimum at short amount of time · weight oscillates lesser and will eventually settle down o advantage improve BP training time and probability of convergence

Local Optimal vs Global Optimal - final set of weights dependent on (3)

Final set of weights dependent on (3) - choice of initial weights - learning rate - number of iterations and epoch - also o limited by variety of training examples - but o final set of weights can be local optimal solution = not universal = best weights at that point in time ANN - able to find local optimal - but hard to find global optimal o because § learning rate is huge weights skipping global optimal point

Grossberg Layer - definition - applicaiton

Grossberg (output layer) - often omitted but sometimes contains linear units that transform output of Kohonen layer into something more useful o units are often omitted o and serve only in a signal conditioning role enabling the network to output more than just a zero or a one and are commonly determined manually o it often plays a minor role - application o used to output the possible set of inputs with known output o can be modified to allow inputs from Kohonen layer to self-organise a map o common to see that Kohonen-Grossberg network is a Self-organising map (SOM)

Kohonen Layer - definition - Winner takes all concept - Nearest Class Problem - Normalisation o why we normalise o implication (why we don't normalise?) - Steps to train the network - issues with kohonen algorithm - strategies to overcome issues - train kohonen layer (summary) - Algorithm training comparison with Rojas R - Clustering o other types issues with clustering

Kohonen (hidden layer) - units represent the classes - and their role is to migrate to the center of such classes - They are trained using a winner takes all algorithm with the chosen unit moving towards the current example Winner takes all concept - Threshold activation o au = if netu > netv for all v then 1 else 0 § winner takes all, whichever unit with highest net will activate = 1, the others will not o if there is a tie § just one of the winning output will have output a=1 o object x belongs to class u if unit u outputs 1 when network is introduced with X Nearest Class Problem - it's the same as saying x is closest to unit u - it is difficult to classify the object to either groups as it is considered near to one scale but otherwise different on a different scale -------------- Normalisation - why we normalise? o distance computation can be costly depending on the scale and dimension o more efficient to normalise the vectors as it allows the inner product to be used as distance - why we don't normalise? o affect potential results § e.g. (3,4) and (0.6, 0.8) may end up as different classes yielding different results (before normalisation both belong to same class, after normalisation both not on same class) -------------- 4 Steps to train the network 1. normalise training set 2. choose initial set of classes that are within the space the objects are from. Treat these classes as vectors of weights which in turn are the units of the network 3. Normalise the initial classes (classes are weights in Kohonen) 4. Unit 'done enough' repeat: o choose a training example x randomly or systematically o find class wc closes to the chosen example (largest net) o update wc by adding n(x- wc) to it and then normalising the result simplified notes for training kohonen layer 1. normalise data set 2. normalise class 3. get net 4. select unit with highest net 5. update and get new weight: w+n(x-w) 6. get length 7. normalise new weight (current weight divide by length) 8. show class table with new update weights -------------- Issues with Kohonen Algorithm - unsupervised issues (do not know) o choice of initial classes o number of classes o termination condition -------------- Kohonen vs Rojas R Kohonen 1. Choice of weight vectors Random 2. Selection of training vector Sequentially 3.Computation of net Selected training vector is computed with all weight vectors 4. Selection of weight vector to update Update nearest vector 5. Weight update method w' = w + n(x-w) (slower convergence) Rojas R 1. Choice of weight vectors Random 2. Selection of training vector Random - vectors may only be trained after many iterations 3.Computation of net Selected training vector is computed with all weight vectors 4. Selection of weight vector to update Update nearest vector 5. Weight update method w' = w + x (pulls neuron closer to input) -------------------- Clustering - Kohonen layer o *define kohonen layer o form of clustering where weights are the cluster leaders o all points are within certain distance to the 'leaders' are considered to be part of that group - 2 Other clustering method A. K-means 1. initialise random points as centroids 2. assignment: assign each point to the nearest centroids 3. update: calculate the new position of each centroids by using the mean which is given by sum of all Euclidean distance between each point to centroid divide by total number of points belonging to that cluster 4. repeat 2 and 3 until no change in centroid § same problem as with choice of initial centroids B. DBScan 1. Define the radius and minimum number of neighbours within radius for a point to be considered as core points 2. for all data points determine if they are core point, border points or noise 3. stop when all points are visited 4. link all core points and border within radius and group them as one cluster § no need to choose initial centroids but need to define radius and min neighbours - Issues with clustering o Advantage § Unsupervised learning = no need to label data § more realistic as most real-world data are unlabelled o Disadvantage Parameters to define = need to consider initial centroids or cluster properties

Adaptive method (Learning) Vs Guessing

Learning algorithm - is an adaptive method where the network self-organises to implement the desired behaviour - done by presenting ANN with some examples of desired input and output mapping to the network - correction step o executed iteratively until the network learns to produce desired response - adapts from previous experience Guessing - picking a random number to produce desired output (not learning but guessing)

MLFFN - limitations - Applications MLFFNT vs MLFFNS

MLFFN (Multi-layered feed-forward network) - network where units are combined and there are no recurrence limitations - difficult to model problems more than one class - no feedback application - boolean functions (logic gates) - distinguish 2 different classes MLFFNT - use with threshold activation MLFFNS - use with sigmoid activation

Momentum - define - application - disadvantage

- can speed up training - can be added to weight change formula o by considering multiple of previous changes o add to weight change § § where = momentum of coefficient - suppose o after training 1st example § w' = w + weightchange o after training 2nd example § w'' = w' + w'eightchange §w''=(w+weightchange)+w'eightchange o continue substituting = previous weights has impact on final weights - disadvantage o momentum does not guarantee convergence § restricted by ANN chosen § can speed up ANN if ANN guarantee convergence o momentum does not guarantee reaching global/local minimum § momentum only pushes the weights o difficult to choose right momentum § BUT right momentum can speed up reach to global/local minimum

ANN as search problems

- goal of training neural network is to produce a network that matched the outputs given for the inputs and produce 'correct' outputs for unseen inputs - this means choosing the suitable weights manually or learning algorithm - AND gate o can be solved by using a single Perceptron o search problem § finding a set of weights that can linearly separate inputs that are all 1s and those otherwise - XOR gate o cannot linearly separate the input and as such needed more than 1 unit of perceptron o the search problem § finding the appropriate weights § and number of units to separate the inputs

Overfitting

- occurs when ANN only seen possible sample that are likely to happen in actual cause o risk § bias towards training sample § not react well to unseen samples - cannot avoid overfitting but can reduce bias towards training examples o split data into 3 sets § Training set · for training ANN (used during learning process) § Selection/Validation/Interim set · for validation/interim testing to monitor progress of ANN § Test set · for actual/final testing to evaluate ANN o process § first train ANN with training set · until performance on selection set seems to be getting worse and worse § At this stage, choose network parameters that works best on the validation/interim test as final parameters use this on final test (using test set)

Network Paralysis - 9 strategies to avoid

- occurs when parameters to a sigmoid activation are too large o result has little to no significant changes to output 9 strategies to avoid network paralysis 1. careful choice of initial weights 2. alternative network architecture 3. growing and pruning networks 4. alternative learning algorithms 5. include momentum in weight update formula 6. different learning rates for different weights 7. adaptive learning rates that depends on stages of learning 8. use lookup tables instead of function evaluation 9. use special hardware designed for running ANNs

Using Different learning rates - How - Advantage - Compare to Backpropagation learning rate

- suppose for Backpropagation o different learning rate on output layer and hidden layer § to observe whether higher or lower learning rate on hidden units can improve network - disadvantage o Backpropagation learning rate = consistent through training § assumption = hidden and input weights both responsible for error on outputs § possible = either weights more responsible for error on outputs · try higher/lower learning rate on input/hidden weights and see if error is reduced · different learning rates = can reduce training time

Show a diagram - feedforward, not layered - not feedforward, not layered - not feedforward, layered - feedforward, layered

-->

{draw units, show truth table} AND gate NAND gate OR gate NOR gate NOT gate XOR gate

-->

Botzmann machine - Boltzmann Machine architecture vs Restricted Boltzmann Machine - properties o ping-pong problem - annealing vs Simulated Annealing (function) - Energy - Temperature - random search with replacement o disadvantage o improve/ alternative strategies - summary of Boltzmann

--> Properties o weights are symmetric wuv = wvu § energy based network that transition of energy goes both way § if asymmetric = ping-pong problem · unstable network o weights are determined by the problem o unsupervised learning and weights has to be determined by observation of training set. However, there is a limit to the comprehensiveness of the observation. Annealing vs Simulated Annealing - Annealing o lettings hot metal cool slowly improved its properties § microscopic structure of object order itself · because it minimises stresses and energy - Simulated Annealing o statistical aided learning as it uses probability o intentional annealing § Temperature T in ANN slowly reduced to 0 as training progresses § system is essentially left to settle down and reach thermal equilibrium at current temperature o if net positive § give activation value to be 1 with probability p § p will reduce as net increases Energy - metal annealing o higher energy when temperature is high but reduces as metal cool down - Boltzmann machine o energy-based network that attempts to minimize energy function § Energy function · energy, o candidate solution assignments of 0 or 1 to the o looking for a set of , each either 0 or 1, which minimises energy o are determined by the problem solved Temperature - Need a strategy to reduce possibility of E getting stuck on a local minimum where no adjusting of weights seem to work - Accept a weight change with probability o instead of always accepting the lowering of evaluation function and rejecting increase o risk of E increased when, reduce T towards 0 and leave system to reach thermal equilibrium Random search with Replacement - choose new weights and recalculate E o evaluation function, § if E = 0; all activation reach target values § if E'<E then accept improved E' and repeat process § idea of search · continue looking for other changes until can't find any o = activation response to - if new proposal decreases value of performance error, accept proposal. Else, reject o eventually lead to minimum performance error - disadvantage o the same rejected weights can reappear and rejected again § not making use of any information gained while searching - improve/alternative strategies o restrict number of weights changes at any one time § searches for local minimum o restrict the size of change to the weights § searches for local minimum o accepts a proposed change even if performance error becomes higher § to improve ANN and move away from local minimum performance error § usage of probability, reduce temperature towards 0 Summary of Boltzmann - network has a training set t and current weights w o evaluation function, § if E = 0; all activation reach target values § if E'<E then accept improved E' and repeat process § although E reach 0, might mean E stuck at local minimum · adjustment on weight do not remove E from local minimum § solution to reduce possibility · instead of always accepting a lowering of evaluation function and rejecting increase, accept weight change with probability , where T is parameter chosen o in annealing of metal, T is slowly reduced to 0 as training progresses o system is left to settle down and reach thermal equilibrium at current temperature

3 Reasons for AI and ANNs

1. Engineer rationale - make better computers - During ww1 processors were used to decipher intercepted messages - Turing wants to create machines that are useful 2. Psychologist rationale - to model human intelligence - Turing wants machine to display human intelligence 3. Philosopher's rationale - to expand computer's intelligence - incorporate different kinds of intelligence into machine such as animal intelligence

11 Problems with ANN

1. Limitations of memory and time o complex problems = too large for implementation o search problems take too long § weight convergence never settling down o small range of number that can be represent = large weights 2. no correct set of weights 3. correct weights too big for software used 4. final activation too sensitive for accuracy with which calculations are done 5. learning algorithms get stuck at local minimum 6. ping-pong problem o weights oscillate without converging 7. training too slow e.g. backpropagation = slow gradient to get to solution 8. overtraining/overfitting = learns well on training set but not on unseen, new examples 9. network paralysis 10. Number of units is a problem o used in growing and pruning experimentation § number of input and output units = usually fixed · fixed by given problem e.g. one class per output unit o Growing ANN = create set of problems o Pruning ANN = create set of problems 11. Hidden units can use pruning or growing strategy using performance test to see if o It performs better = less error on output § risk of overfitting o Not much difference = no significance change on error § good for simpler network, if no loss of functionality o much worse = more error on output § essential for ANN § essential difficult concept for systems where performance is distributed

3 Hopfield Application

1. N Rooks Problem 2. N Queens Problem 3. Travelling Salesman Problem (TSP)

Batch mode or Batch Learning VS Online mode or Online Learning

Batch mode (batch/offline learning) - weight update o happens when number of examples have been seen by the ANN o happens at specific timing - advantage o less update time = update only takes place once a while - disadvantage o not reach optimal solution e.g. miss important example Online mode (Unbatched/online learning) - weights update o every time the ANN sees an example o can happen anytime - advantage o every example considered - disadvantage o take longer to update

Hopfield network - Hebb's Rule vs Widrow-Hoff's Rule Similarities and differences

3 Similarities - Used to train Hopfield - Used to learn required sinks - Use reinforcement learning Differences -------- Hebb's changes are based on making correlations between units that reflect correlations in training set for threshold unit, learning rule - take random pattern from set of required sink state that is being imposed on network for each pair of units u, v in the network, if their activations for this chosen pattern are the same au = av - add small positive amount n to wvu - otherwise subtract n from wuv no constant formula - other literature use different types of units - there is a lot of scope for alternative in this subject -------- Widrow-Hoff's if pattern not stable state - some units, the state pattern represented differs from that which would be generated by looking at the activation generated by unit's net - weights are modified to correct particular unit's net

Backpropagation vs Perceptron (SALAD FELO)

Backpropagation Learning algo = Widrow-hoff rule △𝑤 = 𝜂ea𝑥(1-a) Activation function = Sigmoid function Outputs = Continuous real values Ease of use = no guarantee convergence Differentiable = Yes Layered = Yes Feed forward = Yes Supervised/ Unsupervised = Supervised Applications = richer applications with continuous values Perceptron Learning algo = △ 𝑤 = 𝜂 e 𝑥 Activation function = Threshold T(<0,0,1) Outputs = Discrete 1,0 or -1,1 Ease of use = Guarantee convergence (easy to use) Differentiable = No Layered = Yes Feed forward =. Yes Supervised/ Unsupervised = Supervised Applications = Limited to discrete outputs

Backward pass vs recurrent

Backward pass - not recurrent - feedback the errors in order for hidden units to update their weights Recurrent - feedback activation of one unit as input to another or itself

Define - bias - why use bias - net - activation function - weight - network - threshold - fires - ANN architecture and in depth definition and methods to determine ANN architecture

Bias - is the weight associated to an input, clamped at 1 why use bias - presence = affects value of net - absence strictly sum of weighted inputs net - is the sum of all the weighted inputs fed into a unit activation function - a function the produces an output from and input, e.g. T(<0, 0, 1) if net is less than 0 then output is 0, else if 0 or more then output is 1 weight - a real number that magnifies the input network - group of units take in an input and create an output threshold - determine when the unit fires Fire - when calculation takes place despite calculation ANN architecture - (CANT CALA)network design containing number of units, calculation algorithm, learning algorithm, type of units, connection of units, arrangement of units - (INOE) I = set of inputs N = set of computing units o including their types, parameter needed and those not to be learnt, to be used learning algorithm is included O = set of output sites E (edges) = connection between units, with their weights although some are learned - methods to determine ANN architecture (I and O) set of inputs and outputs o determined by the problem at hand o with scope of pre-processing and post-processing (N) Number of units o determined by number of classes needed in classification problem o Type of units § depends on the input and output value types § if discrete or continuous output is desired, threshold function used (E) Connection between units o depends on number of inputs and outputs o often achieved via trial and error/experimentation

Classification Problem Unsupervised problem

Classification Problem - Classification o 2 phases can be attempted with ANN § First, set of objects used as training set · training set used to determine a set of classes § Second, process to assign any new object to the classes found during training - However, not clear that the techniques are in fact ANN o although almost any algorithm can be rewritten as ANN --------------------- Unsupervised Learning Problem - if number of classes is unknown in training ANN do not know classes and thus, need to find suitable classes

Perceptron vs Boltzmann

MLL FUSS Perceptron ----------- 1. Motivation: Biological Neuron unit fires when there is calculation of the net occurs and activates after reaching a threshold 2. Learning Algorithm: Change in weight, w = nex 3. Learning style: Uses information gained after each training example Prevent rejected weights to reappear again 4. Feedforward Yes 5. Unsupervised/Supervised Supervised Inputs are given an expected output Take account the error on output 6. Start Values Randomise or chosen based on problem 7. Stop values Minimize error Boltzmann --------------- 1. Motivation Annealing of metals inspired by metal annealing and is an energy-based network. makes use of reducing temperature as training progresses 2. Learning Algorithm Random Search with Replacement 3. Learning Style Uses temperature reduction and probability Do not use information gained, rejected weights may reappear again 4. Feedforwad No 5. Unsupervised/Supervised Supervised/Statistical aided Inputs are given an expected output Take account the error on output Simulated Annealing allow activation with probability 6. Start values Randomise or chosen based on problem 7. Stop values Minimise evaluation

Characteristics of - Markov chain (MC) - Hopfield - Recurrent Neural Network

Markov chain - not layered o all units are hidden cells o no input or output layer - not feed-forward o input can flow both ways o i.e. unit 1 feeds unit 2 but unit 2 can also feed unit 1 Hopfield - not layered o all units serve as an input before training, hidden during training and output when used - not feed-forward o input can flow in both ways o i.e. unit 1 feeds to unit 2 but unit 2 can also feed to unit 1 Recurrent Neural Network - not feed forward o there are loops - layered o there are still input, hidden and output cells

3 Hopfield Application 1. N Rooks Problem 2. N Queens Problem Hopfield for Rooks and Queens Problem

N Rooks Problem o rook can travel horizontal and vertical on a NxN chessboard o Place N number of possible rooks such that no two rooks can threaten one another § e.g. on 8x8 chessboard = number of rooks = 8 ------------- N Queens Problem o queen can travel horizontal, vertical and diagonal on NxN chessboard o Place N number of possible queens such that no two queens threaten one another § e.g. on 8x8 chessboard = number of queens = 8 ---------- Hopfield for Rook and Queens Problem o represent each position on chessboard as a unit § -2 for each unit weights going on the same row, column (and diagonal for queen) · minimize energy (step 1: training for actions below) § all other weights set to 0 § set threshold T(<0, 0,1) o select a unit and set to 1 = represent placement of queen or rook § inhibits other units to be 1 in the same row, column (diagonal for queen) · step 2: testing for actions below o actions for pattern recognition § First, pick set of weights that minimise energy (training) § secondly, pick random initial inputs and allow network to minimize energy (testing)

Natural vs Artifical Neuron

Natural - fire when reach threshold - unlimited number of neurons - in born, found in nervous systems - asynchronous firing Artificial - fire when there is a calculation - limited number of neurons - man-made - synchronous firing

Pocket algorithm - implication

Pocket Algorithm - based on perceptron learning - stores set of weights and compare it with the new weight at each iteration o if new weight has lesser error § new weight replaces stored weight § continue until no more error Implication - possible better weight gets replaced with inferior weight o local vs global optimal (skip global optimal) o but if training example is finite and rational weights

Backpropagation (BP) network - properties - type of learning (2) - 2 phases (steps) - limitations - 2 learning algorithm

Properties - feedforward (no loops) - layered with 2 or more layer of units (each unit feed next immediate layer) Type of learning - supervised learning (present with training data set with expected output) o corrective learning (error and input used to update weights) - online learning (weights updated for every error) 2 phases 1. Forward Pass - calculate the output of the network by forward propagating the outputs of one layer to the inputs of the next 2. Backward pass - given the actual output of the network, calculate the error and update the weights based on the Widrow-Hoff rule - error on output o e = (t-a) a (1-a) - change of weight from hidden unit (at output layer) o △ 𝑤 = 𝜂 e ah o ah previous row value - error for hidden unit o eh = e wh ah (1 - ah ) - change of weight from input I to hidden unit h o △ 𝑤ih = 𝜂 eh xi o xi previous row value Limitations - convergence is not guaranteed as the weight may be out of range of the computer capability - it takes very long to compute the final set of weights - may end up with local optimal solution rather than global optimal 2 learning algorithm 1. Windrow-Hoff rule - error correction learning rule in which the amount of learning is proportional to the difference between activation achieved and target activation o e = t - a o △ 𝑤 = 𝜂ea𝑥(1-a) w0' = w0 + △𝑤 2. Sigmoid activation

Hopfield network - draw architecture - 4 properties

Properties 1: no learning algorithm (trial and error) § sink states chosen manually using trial and error § motivation = net eventually end up sink state § Hopfield training usually trial and error Properties 2: Net & Energy § set of N units each connected to all others § no bias unit but include possibility of bias in unit § binary threshold units (Activation functions) · T(<0,0,1) · T(<0,-1,1) · T(>0,1,0) · T(>0,1,-1) § net of all units, S = , S increase because of change · if unit change from 0 to 1 = S increase by netk · if unit change from 1 to 0 = S increase by net § Energy E = (-0.5)S · As transitions in Hopfield network leads to lower energy, constraints are encoded by making their breaking contribute to increase in energy · solutions = minimize energy (sinks) · wrong solutions = increase energy · sum of energy · given wij= wji o E = o energy never increase, decrease when unit changes from 1 to 0 Property 3: Restrictions § units connected to all others = unit is not connected directly to itself · wuu=0 § connections between each pair have symmetric weights · wvu= wuv Property 4: Recurrent Network § all units have predecessors · initial values = inputs § all units have successors · no unit is just an output · same unit is both input and output · units to monitor = outputs § output of unit depend on itself · output determines output of its successors

Formula of - Sigmoid activation - Bipolar activation - Linear activation - Tanh - Threshold activation Threshold vs Sigmoid

Sigmoid function - well-behaved function that can be understood mathematically - continuous (smooth) = allow real values - differentiable = computational efficient Threshold functions - not continuous = don't allow real values not differentiable = not computationally efficient

Hopfield network - state of a network > state > source state > 3 importance of state > state table > 3 properties of state transition table > State transition Diagram = directed edge

State of a network - list of units and their corresponding activations - state o unit and corresponding activation o state of a unit = activation 1 or 0 o sink or stable state § state which there are no escape § no successors § useful energy wells · summation value 1 = non-negative · summation value 0 = negative · minimise energy sources state § no predecessor 3 importance 1. important for systems with feedback = not important for feed-forward 2. in recurrent network = some sense of memory as the future output depends on previous ones = state is important 3. in feed-forward network = each output is dependent on previous training examples and initial inputs = state is less important State table § first N columns labelled 1 to N to represent current state of corresponding unit § Next N columns labelled 1 to N to represent state of corresponding unit if it should fire § no need to record any changes to non-firing units o State Transition Table § Before · give state of each unit · initial binary input values § After · show what happens if corresponding unit fires · 1=activates, 0 don't activate § States After · shows the possible state of the whole network after each firing · next state if a particular unit fires State Transition Diagram § a graph whose nodes are states and directed edges are possible transitions § (state changes occurring from left to right) § directed edge · is drawn to every other state that can be obtained from that state · successors to that state

Hopfield network - 2 strategies to get net and activation

Strategy of getting net and activation - for a unit, calculate net and activation o before moving on to next unit and update all values choice to § use new value § or wait until all units have been calculated 1. Synchronous strategy o wait and change all values together o activations of all units are calculated at each step o ordered, systematic strategy 2. Asynchronous strategy o calculation of one activation completes a step o use new value in calculation of next in out subsequent calculations o more easily dealt with in theory and practical investigation o must choose unit to update § 2 ways to choose ((a))· systematic update o each unit is updated same number of times o each unit updated at least once ((b))· random update (more commonly used) o update random unit o although unit has recently been updated o complication § no feedback in network = behaviour of unit depends only upon its inputs · easily trace unit behaviour by observing behaviour when changing input § asynchronous firing = difficult to trace behaviour · for Hopfield = have to take account order of firing

Supervised vs unsupervised learning 2 Variation of supervised learning

Supervised learning - data labelled with their expected output (controlled environment) - ANN is expected to o predict the appropriate correct response from unseen data o require computer to remember the response to each stimulus - advantage o good for predicting future data trends - 2 variations o Reinforcement learning (reward) § weights are updated based on the input vectors only o Corrective learning § weights are updated based on magnitude of error and input vectors Unsupervised learning - data are unlabelled - ANN is expected to o discover the hidden structure within the problem at hand o some sort of improvement rather than correct response - advantage o resulting model good for data exploration and visualisation - disadvantage not good at predicting data

Hopfield Network - time - network going from a state to itself - pattern recognition

Time - sequence of state through network travels as units fire Network is going from a state to itself - unit may fire without change in state Pattern Recognition - need to specify sinks for pattern recognition - 3 methods to specify/learn sink o trial and error o Hebb rule o Widrow-Hoff rule

3 Hopfield Application 3. Travelling Salesman Problem (TSP) Hopfield for Travelling Salesman

Travelling Salesman Problem (TSP) o Find a path through cities S1 to Sn § such that each city is visited once § minimal round-trip length o Representation of TSP in matrix § Draw NxN binary matrix · rows = represent cities S1 to Sn · columns = represent sequence of visit 1 to N § Each 1 = represent a visit to a city at certain sequence · only 1 single 1 allowed in each row and each column · otherwise it means that salesman will visit a city twice or two cities simultaneously -------- Hopfield for Travelling Salesman o each cell in matrix is represented by a unit § each weight represent the distance form one city to another o arranging the matrix such that each row and column only has a single 1 is solved using rook or queen's analogy o problem left = find the minimal round-trip distance (length) o choose matrix with minimal length = minimize energy § build up on top of rooks or queen's problem · modify energy function add factor of distance calculation on top of energy function for queen or rook's problem

Meaning of feedforward

each layer feeds to the subsequent layers without feedback (no recurrence)

Recurrent

if a feedback occurs in units and network (first training phase output used in second training phase)

Perceptron - a line in a plane - type of learning (2) - algorithm - limitations

definition - A single unit ANN a line in a plane - step activation (line separate outputs on two sides) Type of learning - supervised learning o corrective learning § need to use error and input to update the weights - online learning o weights updated § after each example § weights dependent on input and sequence Algorithm - error o expected output minus activation o e = t - a - after each example, o if error present = update § the sooner the error detected, the sooner the weight stop updating o else stop - Change in weight o △ 𝑤 = 𝜂 e 𝑥 - sum squared weight change o Σ△w2 - new weight o w0' = w0 + △ 𝑤 = w0 + 𝜂e𝑥0 Limitations - final set of weights always after epoch with no error OR and AND gate show there are more than 1 possible weights (result to local optimal solution not global optimal)


Ensembles d'études connexes

Profiles, Permission Sets & Permission Set Groups

View Set

Medicare parts c and d general compliance

View Set

A.D. Banker - Health'Life Insurance - Annuity Chapter 5

View Set

Stats 7-2 Estimating population Proportion

View Set

Conceptos Básicos de Instrumentación

View Set