Intro to Neural Networks

¡Supera tus tareas y exámenes ahora con Quizwiz!

to get a better model, what CE are you looking for?

a lower CE

how is a perceptron similar to a brain nerve cell?

dendrites are inputs, axon is output, outputs if electrical charge is strong enough

what 2 conditions should be met in order to apply gradient descent? (Check all that apply.)

error function should be: - differentiable - continuous

calculate cross entropy in python

import numpy as np def cross_entropy(Y, P): .....Y = np.float_(Y) .....P = np.float_(P) .....return -np.sum(Y * np.log(P) + (1 - Y) * np.log(1 - P))

how can you go from 'and' to 'or' perceptron?

increase weights or decrease magnitude of bias

in a nutshell, what does backpropagation consist of:

-Doing a feedforward operation. -Comparing the output of the model with the desired output. -Calculating the error. -Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights. -Use this to update the weights, and get a better model. - Continue this until we have a model that is good.

The sigmoid function is defined as sigmoid(x) = 1/(1+e-x). If the score is defined by 4x1 + 5x2 - 9 = score, then which of the following points has exactly a 50% probability of being blue or red? (Choose all that are correct.) 1,1 2,4 5,-5 -4,5

1,1 -4,5

what is gradient descent algorithm?

1. start with random weights for every point(x1..xn): ....for all n: ........update wi' <- wi-alpha(y^-y)x_i ........update b' <- b-alpha(y^-y) repeat until small error

Now that you know the equation for the line (2x1 + x2 - 18=0), and similarly the "score" (2x1 + x2 - 18), what is the score of the student who got 7 in the test and 6 for grades?

2

what is the error formula, used in gradient descent?

E=−m1​∑i=1m​(yi​ln(yi​^​)+(1−yi​)ln(1−yi​^​))

how does the perceptron algorithm work?

Recall that the perceptron step works as follows. For a point with coordinates (p,q), label y, and prediction given by the equation y^​=step(w1​x1​+w2​x2​+b): If the point is correctly classified, do nothing. If the point is classified positive, but it has a negative label, subtract αp,αq, and α from w1,w2,w_1, w_2,w1​,w2​, and b respectively. If the point is classified negative, but it has a positive label, add αp,αq,and α to w1,w2,w_1, w_2,w1​,w2​, and b respectively.

what is the basic flow for logistic regression?

Take your data Pick a random model Calculate the error Minimize the error, and obtain a better model

after all the math is done, what is the gradient turn out to be?

The gradient is actually a scalar times the coordinates of the point! And what is the scalar? Nothing less than a multiple of the difference between the label and the prediction.

Given the table in the video above, what would the dimensions be for input features (x), the weights (W), and the bias (b) to satisfy (Wx + b)?

W: (1xn), x: (nx1), b: (1x1

a high Cross entropy indicates?

a worse model

how is the bias calculated in a perceptron?

as a node with initial weight 1, and multiplied by weight, then added in perceptron score

what is the formula for multi-class cross entropy?

ce=-sum_n*sum_m(y_ij * ln(p_ij) n=num doors m=num animals where p_ij is probability of animal i behind door j

If a point is well classified, we will we get a small or large gradient. And if it's poorly classified, will we get a large or small gradient?

closer the label to prediction, smaller gradient farther from label to prediction, larger gradient If a point is well classified, we will get a small gradient. And if it's poorly classified, the gradient will be quite large.

whats a network look like for XOR

combo of (AND,NOT) + (OR)

how is cross-entropy related to the total probability of an outcome?

cross-entropy is inversely proportional to the total probability of an outcome.

let's define the combination of two new perceptrons as w1*0.4 + w2*0.6 + b. Which of the following values for the weights and the bias would result in the final probability of the point to be 0.88?

dont forget to apply sigmoid (σ′(x)=σ(x)(1−σ(x))) σ=(e^x/(e^x + 1)) w1 =3 w2=5 b=-2.2 3*.4 + 5*.6 -2.2=2 (e^2)/(e^2+1) = 0/88

what function turns every number into a positive number?

exp

how are events and probabilities related to cross-entropy?

given events and probabilities, how likely is is those events based on the probabilities. if its likely , small cross-entropy unlikely, then have large cross-entropy

whats is the cross-entropy for a good and bad model ?

good = low cross entropy bad = high cross entropy

which of the following is true: a) higher CE => lower probability of event b) higher CE => higher probability of event c) no relation CE=> probability of event

higher CE => lower probability of event cross-entropy is inversely proportional to the total probability of an outcome

how does learning rate work?

it adjusts the subtraction of inputs from the coords of the line equation to make the line move in a smaller amount

how does the step function work for perceptron?

it returns 1/0 based on value of node. (this is activation functions)

how is the activation function applied to the equation for line?

its multiplied

what function turns products into sums?

log log(ab) = log(a) + log(b)

is a lower cross entropy better or worse model?

lower CE is better model

what do you do to go from a bad model to get to a good one?

minimize the cross-entropy

how are the error function and probability related?

minimizing the error function maximizes the probability

what is a deep neural network?

mlp with many layers, highly non-linear boundary

whats maximum likelihood?

pick the models that gives the existing labels the highest probability.

how do you go from discrete to continuous functions?

replace step function with a continuous function like sigmoid

how do you construct 'and' perceptron?

set weights and biases, so plotted line is above/below correctly w1=.25 w2=.25 b=-.5

what happens in the feed forward step?

simply calculates the probability of the y value being correct, and also how far the point is from the line

what function do you use if you have 3 or more classes?

softmax

what is the formula used in minimizing error function? what are steps?

start with random weights using sigma since each point is under cross-entropy, correctly classified will give smaller error

what is cross-entropy?

the sum of negative logs

how is the gradient descent algorithm different from perceptron algorithm?

they are essentially the same, but: GD: y^ can take continuous values PA: y^ only 0 or 1 and once classified correct: PA: stops moving line GD: keeps pushing line away.

whats the fundamental calculations for softmax classification

use exp (since no negative numbers), and divide by sum of exponents, giving a probability

how to get multi-class classification?

use multiple outputs and softmax

how would you input different values for different categories (such as duck,beaver,dog)?

use one-hot encoding

since the multiplication of many many probabilites ( [0,1] is a very small number, how do you calculate the probabilities in a better way

use the log, and since it returns a negative, use - log (cross entropy)

how does back prop work with 2 inputs , one hidden layer?

uses error of each point, and tells model to move line away/toward the point, the weights are then adjusted

chain rule usage?

using chain rule, just multiplying a bunch of partial derivates

how do you calculate the gradient of E at point x?

using partial derivs: ∇E=(∂w1​∂​E,⋯,∂wn​∂​E,∂b∂​E)

how do you set the weights for 'not' perceptron

w1=-.33 w2=-.99 bias = .66

calculation for perceptron X1 inputs, wi weights, b bias?

w_x +b = sum(w_i * x_i) +b

when would the sigmoid function yield 50%?

when the score(line equation @ points) evaluates to 0

whats the formula for feedforward using matrices?

y^=σ dot W^2 dot σ dot W^1 *x

what is formula for sigmoid?

σ′(x)=σ(x)(1−σ(x))

what is the final formula used for calculating the error gradient?

∇E=−(y−y^​)(x1​,...,xn​,1)


Conjuntos de estudio relacionados

Chapter 8: Marriage and Cohabitation

View Set

Operating System Slide Questions

View Set

FOUNDATIONS OF PROGRAMMING : LOGIC IN PROGRAMMING : 04.02 MORE DECISIONS

View Set

Mastering A&P (MAP) Chapter 4 -- tissues

View Set

MICRO: CH. 13 Monopolistic Competition Part I

View Set