neural network collection 1

Ace your homework & exams now with Quizwiz!

Engineers

Signal processing and automatic control

Why is the making of silicon neurons not applicable?

Silicon chips and circuits are two dimensional whereas the communication between biological neurons in a brain occurs in a three dimensional circumstance therefore the communication between silicone neurons won't work

multilayer networks

Such networks are multi-layered as we can give each unit a layer number. Starting at the input layer, which we call layer zero (but remember that in this course, we never draw these as units), we give each unit a layer number which is one greater than the maximum of the layer numbers of all those units that feed it.

definition of perception

define the terms: threshold units, step units, step activation, extended truth table, clamped, threshold, bipolar activation and recurrent

Computer scientists:

information processing and learning, image classification, object detection and recognition

what are perceptron?

the perceptron is an algorithm for supervised learning of binary classifiers.

Auto associative network

Achieved by complex search of store pattern

Know which scientific fields ANNs are used in.

--Computer scientists: information processing and learning, image classification, object detection and recognition --Statisticians: classification --Engineers: signal processing and automatic control --Physicists: statistical mechanics --Biologists: predicting protein shape from mRNA sequences, disorder diagnostic, personalized medicine --Philosophers: minds and machines --Cognitive scientists: models of thinking, learning, and cognition --Neuro-physiologists- understanding sensory systems and memory -ANN can be used to understand how visual info is represented in V2 and V4 and higher levels of visual hierarchy; some studies, humans and ANN can solve same task -Showed mice black and white movies while recording regions in visual cortex --Used in lots of different fields

Know how ANNs compare to human brains.

-Composed of many units (similar to neurons) -Units are interconnected (similar to synapses) -Units occupy separate connected layers (similar to multiple brain regions in sensory pathways) -Units in deeper layers respond to more and more abstract information (similar to more complex receptive fields in "higher" cortical areas) -Require learning to perform tasks efficiently (similar to neural plasticity) -Through experience, ANN learn to recognize patterns

What can ANNS do?

-Facebook face recognition at 98% accuracy -Self driving cars: detect important objects on the road, tell moving cars apart from cyclists and pedestrians, predict what objects will do, choose a path -Navigation: choosing the best route given current traffic conditions, finding landmark locations -Why were they struggling before? Each person's sounds are unique, humans speak in a continuous flowing manner, "ice cream" v. "I scream", "I" and "eye", and other language ambiguities make it hard for voice processing -Voice recognition (Siri, Skype, Android) -Language: they can describe pictures (but they pay more attention to details)

How do we find a good learning rate?

1) Start with large learning rate. 2) Divide learning rate by 2 (or 5). 3) Retrain the ANN and validate the performance. 4) Continue with 2) if performance increased.

How much energy consumption is used by action potential's in synaptic transmission?And what is the rest used for?

50 to 80%. The rest is used for manufacturing and maintenance

An extended truth table.

?a Net Σ Activation a = A(Σ ) = T (0,0,1)(Σ ) 0 0 1 1 w if w < 0 then 0 else 1

Recurrent Neural net

A complex form of artificial neural networks where there is one layer and all the units are interconnected and service input and output. this form is enabled store patterns

Is an initialization of 0 for all the weights a good idea?

Although such an initialization is clearly unbiased, it would cause all the outputs, hence all gradients and therefore all weight updates to be identical.

a very simple network

Although this is a very simple network, it allows us to introduce the concept of an 'extended' truth table. An extended truth table is a truth table that has entries that evaluate to 0 or 1 but these entries could be variables or even expressions. Because all the inputs are binary, we can use an extended truth table to show how inputs map to outputs.

What kind of techniques do analogue calculations provide?

Basic calculus techniques

Why do we use efficient coding?

Because spike coding is too energetically expensive

net and bias

Before we move on, let us look at what we can make with units of the type where net = bias + wΣ1 N ?i

How do silicone retinas work?

By connecting two siliconal neurons that extract info like what angles and lines

How do we find the right depth and width of a ANN? What are the limitations?

By (1) expanding, testing and comparing performance, and by (2) finding balance between over- and underfitting. The limitations are (1) the information in the data and (2) the computational power.

Regarding the distribution of the data, how can we speed up learning?

By centering the data around 0 by subtracting the mean of each dimension of the training data. This would place the data around the undecided state of the activation function and hence speed up the learning.

How is an associative memory formed?

By modifying the strength between layers so when input pattern is presented the store pattern associated with inputted pattern is retrieved

How do networks learn?

By modifying the strength of its connections.

What is a silicon Retina do?

Captures light and changes its Output depending on the light changes

Statisticians

Clssification

Analogue

Compound with similar molecule structure to another compound

How do we recognize a good learning rate in terms of cross entropy and accuracy?

Cross entropy decreases steadily and accuracy increases steadily.

What general differences are there between shallow and deep ANNs?

Deeper models tend to perform better as they add more parameters, while shallow models start to overfit.

How do we tackle class imbalances?

If the differences are not very large: draw balanced mini-batches. Sparse classes are then shown more often. If the differences are grouped in different classes: draw balanced mini-batches from different subgroups. It then shows larger groups more often. For very unbalanced classes: weight the loss. The loss for misclassified samples of small classes is then increased.

MLFFN

If we combine units in a way that ensures that there is no recurrence, that is there are no 'loops' (no output of a unit can affect its input), even indirectly, then we obtain what we call a multi-layered feed-forward network or MLFFN for short

value of bias

In this case the value of net is the value of bias plus w times the sum of the other inputs. We can write this as: net = bias + wΣ1 N ?i

In the case of the rectified liniar unit (ReLU) function, what if the backpropagation step causes to map all possible inputs to a negative drive?

In this case, the ReLU gate never fires: the output and derivative are 0. The error cannot flow through the ReLU. Its weights won't be updated anymore. The ReLU is "dead". This can be avoided by using a small learning rate and slightly positively initialized biases.

On what parameter(s) does the stability of the gradients depend?

It depends on (1) the derivative of the activation function and (2) the values of the weights.

On what parameter(s) does the network's update rule depend?

It depends on the gradient.

On what parameter(s) does the network's "training speed" depend?

It depends on the size of the gradient (large gradient = high training speed).

feed forward'

It is 'feed forward' because we can think of each layer as feeding forward to subsequent layers - there is no feedback.

How is information stored in an artificial neural network?

It is stored in the weight of connections were each neuron is dumb and only response to the weighted input.

Perceptrons and parallel processing

Learning can only be implemented by modifying the connection pattern of the network and the thresholds of the units, but this is necessarily more complex than just adjusting numerical parameters

How is using efficient coding helpful?

It maximizes the amount of information we observe through the parents of spikes by reducing redundancy.

What are the advantages/disadvantages of ELU?

It saturates for negative values only, which makes it noise robust with no penalty for highly positive values, and its mean activation is close to 0 (the gradients for the biases are stable). It is however expensive to compute.

What can ANNs not do?

Learning from small numbers of examples and less practice (ANN: 38 days vs. human: 2 hours) Solving multiple tasks simultaneously Holding conversations Active learning (humans seek new information to gain knowledge) Scene understanding Language acquisition Common sense Feelings Consciousness Theory of mind (understanding thought and intentions of others) Learning to lean (getting better at learning new tasks) Creativity

Philosophers

Mind and machines

Sparse coding

Neural coding based on the pattern of activity in small groups of neurons. (energy efficient)

Integrate and fire neurons

Neurons built of transistors and adds all inputs and info they see which is coded in voltages and fire an action potential if the voltage reaches the threshold

ANNs Step 1

Neurons in ANN are called units and they receive info from other units (like dendrites through neurons) then they integrate the inputs similar to IPSP and EPSP in real neurons. Each unit has preferred threshold, and if summed signals are greater than threshold, unit will pass info forward in network

the 'input' at the bias

Notice that we have set the 'input' at the bias to zero. This means that the bias has no effect on the calculations as whatever its value the result of multiplying by zero will be zero. We say 'no bias' when the bias has no effect. Also notice that there is just one input, ?a.

bias

Now we can see that -bias is acting as a (variable) threshold that the rest of the sum must equal or exceed before the output can become 1. When you read around the subject you will see threshold being mentioned, and you now know that this is just minus the bias, which in turn is the name of a weight connected to an input that is always 1

MLFFNS

Of course, during the training process, all weights may affect each other. When we wish to distinguish between MLFFN whose units are threshold units from those whose units have a sigmoid activation function we use MLFFNT for the former and MLFFNS for the latter

If our nerves impulses are slower than computers how do our cells connect to each other?

Our brain has Parallel networks that have folded cells and connect to each other.

Multiplexed

Process where several messages can be carried though the same wire

Content addressable storage

Processed knowledge residing in network itself, no separate system

Physicists

Statistical mechanics

Drawing a line in a plane How about the inverse of this: given a straight line graph, can we build a unit that separates the plane along the line?

Suppose that we have a line given by y = mx + c. We can see that this can be written as mx - y + c = 0 so that setting bias = c, v = m and w = -1 will provide the required weights. Not all straight lines, however, can be written as y = mx + c: for example, a vertical line cannot be so written. However, it can be written in the form represented by units. The vertical line which goes through x = c and can be written as c - x + 0y = 0, that is bias = c, v = -1 and w = 0.

percentage of a computational task

The answer to this problem in some ways resembles the speedup problem in parallel processing, in which we ask what percentage of a computational task can be parallelized and what percentage is inherently sequential.

function of classical perceptron?

The classical perceptron is in fact a whole network for the solution of certain pattern recognition problems

Describe the "vanishing gradients" problem.

The gradients exponentially decrease (and, eventually, vanish) while the error backpropagated through the layers. As a result, the first layer does not learn and keeps its random initial values: the first layer "kills" all of the signal. This is especially a problem for very deep networks.

idea to train the system

The idea is to train the system to recognize certain input patterns in the connection region, which in turn leads to the appropriate path through the connections to the reaction layer

Feed forward associates

The simplest form of artificial neural networks that contains layers of interconnected input and output units

when the system fires

The system fires only when sum ni=1 wiPi ≥ θ, where θ is the threshold of the computing unit at the output.

Where do you integrate and fire neurons act and what are they capable of doing

They act in their sub threshold region even though they are capable of switching her voltages to go past the threshold

What are artificial neural networks use for and how are they observed?

They are used to study learning and memory and their observed on computers and they consist of simple units connected in networks

What happens when silicon neurons are multiplexed?

They imitate the connectivity of the brain

What can additional transistors provide?

They provide conductances to imitate voltage in time and dependent current flow of ion channels

What have neural engineers done to focus on speed rather than strength for silicon neurons?

They use the strategy of analogue and not digital coding

How do Analogue circuits work?

They work by changing their voltages continuously like in the rising phase of an action potential.

How to make it smarter?

Things that can and cannot be improved Ability to connect neurons = smarter, increase connections (units) ANN only learn what you program them to learn (trials is not training, just practicing, can't teach themselves, can't LEARN) THEY CAN'T LEARN We can study for quiz that can help us with a test, but ANN can't generalize to other situations Can't understand social cues

Why is initializing the weights with larger value not a good way around the vanishing gradients problem?

This would just cause the derivative of the logistic function to shrink alongside the gradient. One good alternative would be to pre-train the network.

perceptron three or more input

Three or more inputs It is important that you have some facility with working out the outputs, given inputs and weights (examination questions often ask you to do this), so let us look at some more examples. The diagram below represents a unit with three or more inputs. The 'dotted' input '...' represents 0 or more edges, allowing for an unspecified number of extra inputs

One or two inputs

To make our calculations easier we will for the time being restrict ourselves to looking at a single unit whose inputs are all binary and whose activation function is the threshold (T(<0,0,1)) given above, so that the outputs are also either 0 or 1. We call units with threshold activations threshold units. You may also see them called step units, as a step is another way of describing a threshold.

Where do any neuron's (biological or silicone) send their messages and info?

To targeted areas/neurons

ANNs Step 3

Training/Learning, ANN need to learn and change over time by establishing appropriate connections between units; increase the strength of appropriate connections, prune away inefficient connections (more you train the better it gets) (EX: classification)

Sparse coding provides information for engineers building neural networks (true or false)?

True

net

We have been able to do this because the inputs are either 0 or 1 and all the weights are the same. Remember that the activation is 1 if net is greater than or equal to zero, so we can write the condition for our unit to have activation 1 as: net = bias + wM ≥ 0 or as M ≥ -bias/w and bias ≥ -wM

What do we understand by "appropriate" biases? When and in what order should they be learned? What effect does it have on learning?

We need the biases to shift the data into an undecided state (regarding the activation function). Those need to be learned before the learning starts as well as sequentially, starting from the second layer. This should cause to slow down early learning.

How do we prevent the dying ReLU problem from happening?

We prevent it from happening either by using a small learning rate and slightly positively initialized biases, or by using a leaky ReLU instead of ReLU.

We use capital N for the number of inputs

We use capital N for the number of inputs. When we introduced the notation of the Figures, we said that it is often useful to add to the external inputs a special 'input' which is fixed at the value 1. We called the weight associated with this input the bias of the unit and now we are including it in the discussion. Previously we set the input to 0 but now it is clamped, that is fixed at 1. Notice that the bias does not count in the N inputs - one often finds it thought of as input 0 but, unfortunately, it is also called input N+1 by some authors

Or artificial neural networks able to generalize input patterns they have never seen before and noticed things and are they capable of retrieving stored patterns even one input patterns are noisy and messed up?

When they see generalized and put patterns they are able to notice regularities patterns and are tolerant and yes they are capable of retrieve and stored patterns even when the input patterns are noisy messed up

What are the advantages/disadvantages of ReLU compared to tanh in very deep networks?

While tanh is subject to the vanishing/exploding gradients problem (if the has not been pre-trained) and its activation and derivative are expensive to compute, ReLU makes the network trainable without pre-training and has a very easy activation and derivative. However, the mean activation of tanh is 0, while that of ReLU is > 0: for ReLU the biases need to be adjusted first. Also, if the backpropagation step causes to map all possible inputs to a negative drive, then the ReLU gate (unlike the tanh one) never fires; as the output and the derivative are 0. The error then cannot flow through the ReLU and the weights won't be updated anymore: the ReLU is "dead".

How do we adjust the biases for ReLU?

With batch normalization.

Are learning rate and mini-batch size dependent?

Yes they are. Smaller mini-batches: smaller learning rate. As a result of this, cross entropy decreases (almost) linearly and accuracy reaches an early plateau. Larger mini-batches: larger learning rate. As a result of this, cross entropy explodes (possibly falls and zigzags) and accuracy is zigzagging heavily.

These formulae give us a means of designing units which, in order to output a 1 on firing:

a. require all inputs to be a 1 (that is an AND gate) by setting w = 1 and bias = -N (the number of inputs) b. require at least one input to be 1, by setting w = 1 and bias = -1 (this is an inclusive OR gate) c. require at least a certain number of inputs to be 1, again setting w = 1 and bias = - the number of inputs that we want to be on. This represents a sort of 'voting' circuit d. require at most M inputs to be 1 by setting w = -1 and bias = M e. require at least one of the inputs not to be 1 by setting w = -1 and bias = N - 1 (this is the NAND gate)

ANNS Step 2

brain is made up of billions of neurons and quadrillion of synapses and the more powerful brains have more connections and neuron between them and similarly ANN with more units, connections, and layers are "smarter" Hierarchy of visual processing: in the retina, neurons are receptive to points of light and darkness; in the primary visual cortex respond to faces, hands, all sorts of complex objects, both natural and manmade - ANN uses similar hierarchy of layers, info becomes more and more abstract at higher levels

Perceptrons as weighted threshold elements

called the classical perceptron and the model analyzed by Minsky and Papert the perceptron

what multilayer networks

describe the architecture of the ANN called a multi-layered feedforward network (MLFFN) • explain the difference between MLFFNS and MLFFNT • carry out simple hand simulations of MLFFNTs • build an MLFFNT corresponding to a given arbitrary truth table • discuss the limitations and possible applications of MLFFNTs

Perceptrons

explain how simple Perceptrons (such as those implementing a NOT, AND, NAND and OR gates) can be designed

Cognitive scientists

models of thinking, learning, and cognition

Biologists

predicting protein shape from mRNA sequences, disorder diagnostic, personalized medicine

Neuromorphic engineering

translation of neurobiology into technology

Neuro-physiologists-

understanding sensory systems and memory ANN can be used to understand how visual info is represented in V2 and V4 and higher levels of visual hierarchy; some studies, humans and ANN can solve same task Showed mice black and white movies while recording regions in visual cortex


Related study sets

Chapter 17 Monopolistic Competition ECON 2302

View Set

Fundamentals - Basic Physical Care

View Set

POLI 2051 - Christopher Kenny - Exam 2 (Practice Quizzes 6-9)

View Set

Chapter 33: The Child with Endocrine Dysfunction

View Set