Week 8 Reading

Ace your homework & exams now with Quizwiz!

Kohenon learning rule (Kohone 1982).

A rule that is the core of self organized map, an unsupervised neural network model

What are some common termination conditions for training NN?

1. For all weights wij, the difference between the old & new values |(wijNew - wijOld)| is less than some specified threshold. 2. Error (e.g. Misclassification Rate) is less than some specified threshold. 3. The pre-specified number of iterations have been completed

Termination Conditions

1. For all weights wij, the difference between the old & new values |(wijNew - wijOld)| is less than some specified threshold; 2. Error (e.g. Misclassification Rate) is less than some specified threshold; 3. The pre-specified number of iterations have been completed

what are types of interlayer connections

1. Fully Connected 2. Partially Connected 3. Bi-Directional. 4. Hierarchical 5. Resonance

What are some design considerations for RBF

1. Instead of weights, uses width and height for connections between input and hidden layer. 2. transfer function in the output layer is usually linear with trainable weights

Softmax Activation Function

1. It is used when we want to represent the probability distribution over n discrete classes, and is most often used as the output of a classifier. 2. It can be, rarely, used in hidden layer, if the node is designed to choose between one of n different options for some internal inputs. 3. It can be viewed as a generalized Sigmoid function, which represent a probability distribution over a binary class variable. 4. It works well with maximizing log-likelihood objective function, but many other objective functions, especially these do not use a log.

Limitations of ReLu Functions

1. Non-differentiable at zero - Non-differentiable at zero means that values close to zero may give inconsistent or intractable results. 2. Non-zero centered - Being non-zero centered creates asymmetry around data (only positive values handled), leading to the uneven handling of data. 3. Unbounded - The output value has no limit and can lead to computational issues with large values being passed through. 4. Dying ReLU problem - When learning rate is too high, Relu neurons can become inactive and "die."

What are data preparation considerations for training a NN model?

1. Sample the Input Data Source 2. Create Partitioned Data Sets 3. Perform Group Processing. 4. Use Only the Important Variables. 5. Data Transformations and Filtering Outliers 6. Imputing Missing Values 7. Use Other Modeling Nodes

Radial basis function

1. interpretation relies more on geometry than biology. 2. the training method is different because in addition to optimizing the weights used to combine the outputs of the nodes, the nodes themselves have parameters that can be optimized.

The advantages of ReLu function are

1.Allow for faster and effective training of deep neural architectures on large and complex datasets 2. Sparse activation of only about 50% of units in a neural network (as negative units are eliminated) 3. More plausible or one-sided, compared to anti-symmetry of tanh 4. Efficient gradient propagation, which means no vanishing or exploding gradient problems 5. Efficient computation with the only comparison, addition, or multiplication 6. Scale well

What is a Neural Network?

A class of powerful, flexible, general-purpose techniques readily applied to prediction, estimation, and classification problems.

What is a loss function used for?

A network is trained by minimizing on a loss function

a network with high momentum responds how to new training examples that want to reverse the weights

A network with high momentum responding slowly

Generalized Learning Rule is developed by Rumelhart et al. (1988),

A rule designed to work for networks with non-linear activation functions and hidden layers. The weights are adjusted using backpropagation.

Detal rule or widrow-Hoff also called Least Mean Square (LMS) method.

A rule that says for a given input vector, the output vector is compared to the correct or desired answer. The rule updates the weights to minimize the error between the actual output and desired output. It works the best for NN with linear activation functions and no hidden layers. It does not work well with hidden layers.

What is a gradient

A vector that stores the partial derivatives of multivariable functions. It helps us calculate the slope at a specific point on a curve for functions with multiple independent variables. In order to calculate this more complex slope, we need to isolate each variable to determine how it impacts the output on its own. To do this we iterate through each of the variables and calculate the derivative of the function after holding all other variables constant. Each iteration produces a partial derivative which we store in the gradient.

what is a combination function

All the output values from predecessor nodes are combined into a single value using a this type of function .

What are different learning rules for adjusting weights?

Hebbian, Detal, Generalized Learning, Kohenon

Are neural networks the same or different than deep learning?

Deep learning has deeper more complex network structure.

Feed forward

Does not use a feedback loop, uses connections in the same layer

Fully Connected inner layer connection

Each node of layer k is connected to all nodes of layer (k+1), where k = 1 is the input layer

Partially Connected inner layer connection

Each node of layer k is connected to some but not all nodes of layer (k+1)

Hierarchical inner connection layer

Feed Forward connections between nodes of adjacent levels only

Resonance inner connection layer

For the given pair of layers with a Bi-directional connection, messages are sent across the connection until a certain condition is realized.

Chain rule refresher

Forward propagation can be viewed as a long series of nested equations. If you think of feed forward this way, then backpropagation is merely an application the Chain rule to find the Derivatives of cost with respect to any variable in the nested equation.

Error in Perceptron

In the Perceptron Learning Rule, the predicted output is compared with the known output. If it does not match, the error is propagated backward to allow weight adjustment to happen.

What are loss functions

MAE, MSE, MAPE, and MSLE Hinge and its variants Cross-entropy and its variants. Logcosh Cosine proximity Poisson

Bi-Directional innerlayer connection

Nodes of layer (k+r) may receive input from nodes of layer k and nodes of layer k may receive input from any node of layer (k+r)

This enables you to distinguish between the two linearly separable classes +1 and -1.

Perceptron algorithm draws a linear decision boundary.

What are the two types of Perceptrons?

Single layer and Multilayer.

what is an output layer

The active nodes of this layer combine and modify the data to produce one or more values of the network to the environment.

What is the chain rule?

The chain rule is a formula for calculating the derivatives of composite functions. Composite functions are functions composed of functions inside other function(s).

what is an input layer

The nodes of this layer are passive, meaning they do not modify the data. They receive a single value on their input, and duplicate the value to their multiple outputs.

A rectifier or ReLU (Rectified Linear Unit) is a commonly used activation function.

This function allows one to eliminate negative units in an ANN. This is the most popular activation function used in deep neural networks.

Feedforward NN

Uses a possible feedback loop. Is very successful in modeling a problem with time sequence

What are two combination functions introduced in the reading?

linear combination functions and radial combination functions.

what is a link

link (i,j) from nodes i to node j has a weight Wji and bias Weight Wi0

In Mathematics, the Softmax or normalized exponential function is

a generalization of the logistic function that squashes a K-dimensional vector of arbitrary real values to a K-dimensional vector of real values in the range (0, 1) that add up to 1.

what are neuron

a mathematical function conceived as a model of biological neurons, a neural network. Artificial neurons are elementary units in an artificial neural network.

quick prop algorythm

a more statistical approach to training a neural network. Tests a few different sets of weights and then guesses where the optima is.

In probability theory, the output of Softmax function represents

a probability distribution over K different outcomes.

Hebbian rule

a rule that indicates the connections between two neurons might be strengthened if the neurons fire at the same time, and might be weakened if they fire at different times.

A Perceptron is

an algorithm for supervised learning of binary classifiers.

Single layer Perceptrons can learn only

can learn only linearly separable patterns.

Radial combination functions

compute the squared Euclidean distance between the vector of weights and the vector of values feeding into the node and then multiply by the squared bias value (the bias acts as a scale factor or inverse width).

What is a linear combination function

computes a "straight line" combination of the weights and the values feeding into the node and then add the bias value (the bias W0j acts like an intercept).

What is an activation (or transfer) function

decide if the values produced by an upstream layer inputs will activate an output connection, i.e., a neuron is "fired", or activated or not. It maps input values to a desired range (e.g. between 0 to 1 or -1 to 1)

The Perceptron algorithm learns the weights for the input signals in order to

draw a linear decision boundary.

Multilayer Perceptrons or feedforward neural networks with two or more layers .

have the greater processing power

Types of activation functions include

include the sign, step, and sigmoid functions.

simulated annealing

injects randomness into hill climbing.

How does back-propagation work

inputs propagated forward through the network, layer by layer, till reaches the output layers. Error value for each neuron in the output layer is calculated for the loss function, then propagated backward until each neuron has an associated error value

In NN A network is trained by

minimizing an loss function

hill climbing

one approach to finding optima

What is the goal of back-propagation algorithm

propagate backward until each neuron has an associated error value

what does radial refer to in RBF

refers to the fact that all inputs that are the same distance from a nodes position produce the same outputs

training rate

responds how quickly the weights change

What are learning rules

rules used to update the weights between the interconnected neurons.

neural networks work best when their inputs are?

small numbers that are standardized inputs

the best approach to learning rate is to what

start big and decrease it slowly

generalized delta rule

technique for adjusting weights

sensitivity analysis

tells how opaque a network is by telling relative importance of inputs to results.

Perceptron Learning Rule states that

the algorithm would automatically learn the optimal weight coefficients.

what is a bias

the bias acts like an intercept in a linear combination function and acts as a scale factor or inverse width in a radial combination function

For iterative algorithms (Topic models, NN), An iterative method is called convergent if

the corresponding sequence converges for given initial approximations(as the iterations proceed the output gets closer and closer to a specific value).

momentum

the tendency for weights inside each unit to change the direction they are headed. Each weight remembers if it is getting bigger or smaller and momentum tries to keep it headed in that direction.

what is a hidden layer

there may be one or more hidden layer. Each hidden layer nodes receives input values (say xi ) from Input or hidden nodes, and communicates its output to either hidden or output Nodes

What is an activation function used for?

to decide if the values produced by an upstream layer inputs will activate an output connection, i.e., a neuron is "fired", or activated or not. It maps input values to a desired range (e.g. between 0 to 1 or -1 to 1)

What is a loss function used for?

used to measure the inconsistency between predicted value (^y ) and actual label (y ). It is a non-negative value, where the robustness of model increases along with the decrease of the value of loss function.

Week 8 Reading

Related study sets

Chapter 15 Multiple Choice (Special Senses)

Hi

Chapter 3 Medical, Legal, and Ethical Issues

AP Human Geography: Chapter Ten: Development

MGT 701 Exam 1

geography test 2

SKULL AND SINUSUS

Prep U Foundations

AES 1003 - Emirati Studies - Chapter 2

Funeral Arranging and Directing

Chapter 8: The cellular Basis of Reproduction and Inheritance.

PD BIO: Lesson 10--The Skeletal System

NCLEX UNIT 8

Chapter 11 A Closer Look: The fat-soluble vitamins and vitamin D

Psych Chapters 1 & 2

SOC Ch. 10

Ethical Leadership

Microscope ;/

Science of Living Well Midterm

Українська філософія