Neural Network Test Review

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What are three examples of activation functions?

- Linear function - Exponential function - Sigmodial function (Logistic)

When using a logistic activation function, neural networks perform best when the predictors and outcome variable are on what scale?

0 to 1

When does the weight updating stop?

1. When the new weights are only incrementally different from those of the preceding iteration 2. When the misclassification rate reaches a required threshold 3. When the limit on the number of runs is reached

How can lowering the learning rate of your neural network help with not overfitting the model?

Adjusting the learning rate avoids overfitting by down-weighting new information. This helps to tone down the effect of outliers on the weights and avoids getting stuck in local optima (local minimun).

What is the (popular) method for using model errors to update weights?

Back Propagation (FORMULA ON 278)

What is one way we can learn about the relationships that the neural network captures?

By conducting a sensitivity analysis on the validation set. This is done by setting all predictor values to their mean and obtaining the network's prediction. Then, the process is repeated by setting each predictor sequentially to its minimum, and then maximum, value. By comparing the predictions from different levels of the predictors, we can get a sense of which predictors affect predictions more and in what way.

What are the two methods used for updating weights?

Case updating and Batch updating

How do we rescale our numerical variables for neural networks?

For a numerical variable X that takes values in the range [a, b] where a < b, we normalize the measurements by subtracting a and dividing by b − a.

How do we determine the amount of nodes that are in the input layer?

For prediction: By the amount of predictors being used For classification: By the amount of classes

What values do the hidden layers take?

Hidden layer nodes take as input the output values from the input layer

What is another way we can normalize our numerical predictors?

If they are highly skewed (distribution), then we can transform the predictor by taking their natural log

What is case updating?

In case updating, the weights are updated after each record is run through the network. These new weights are next updated after the second record is run through the network, the third, and so on, until all records are used. This is called one epoch, sweep, or iteration through the data. Typically, there are many iterations. Because of the many iterations, case updating is very computationally hard.

What is the most used application of Neural Networks?

Multi layer feedforward networks

What is the structure of a Neural Network?

Neural Networks contain an input layer consisting of nodes (sometimes called neurons) that simply accept the predictor values, and successive layers of nodes that receive input from the previous layers. The outputs of nodes in each layer are inputs to nodes in the next layer. The last layer is called the output layer. Layers between the input and output layers are known as hidden layers.

Why would overfitting occur in a neural network?

Overfitting can easily occur in a neural network when the network becomes to complex. Network complexity happens when you add additional hidden layers and hidden nodes to the network.

When computing classification in a neural network, the output nodes will what?

Propensities

What is batch updating?

TEXTBOOK DEF: The entire training set is run through the network before each updating of weights takes place. In that case the errors, errk, in the updating equation is the sum of the errors. ZAHRA DEF: A batch (not the entire dataset but say a handfull of observations) of observations is run through the network before each updating of weights takes place. In that case the errors in the updating equation is the sum of the errors from all records included in the batch.

What is the key ingredient by which the net evolves to produce a more accurate prediction?

The Neural Network estimation process is the key ingredient by which Neural Network evolve to produce a more accurate prediction. The estimation process uses the error, iterartively, to update the estimated weights (coefficients). After the weights are updated at each iteration, the error of the output node is obtained by comparing the difference between the predicted and the actual outcome. Then the output node error is distributed across all the hidden nodes in a way that each node is responsible for part of the error. Consequently, each node-specific error is then used to update their respective weights to produce a more accurate prediction. The process by which weights are updated is done in two ways; either by Case updating or Batch updating. The former looks at the error associated with each individual observation and updates the weight by observation, while the latter looks at a 'batch' of observations and updates the error based on that batch before going observing the next batch of observations. Neural Networks are known for high predictive performance because of its estimation process and weight updating.

What does the bias value mean?

The bias value is a constant that controls the level of contribution of node j.

What is included in the formula for calculating error associated with output node k?

The correction factor: ^yk(1-^yk) The original definition of an error:term-19 (yk - ^yk)

When computing the output of your hidden node and output node, what does function g refer to?

The function g is called a activation function or an identify function

What is the idea behind Neural Networks?

The idea behind neural networks is to combine the predictor information in a very flexible way that captures complicated relationships among these variables and between them and the outcome variable.

The neural network estimation method of is different from least squares and maximum likelihood because it uses what function to determine the optimal minimum error?

The loss function

How are the weights and bias values initialized?

The values of θj and wij are initialized to small, usually random, numbers (typically, but not always, in the range 0.00 ± 0.05). Such values represent a state of no knowledge by the network, similar to a model with no predictors. The initial weights are used in the first round of training.

What are the values on the arces that connect nodes called?

The weights. The can be denoted as wij, meaning the weight from node i to node j.

How is overfitting avoided in neural networks?

To avoid overfitting, it is important to limit the number of training iterations and not to over-train on the data.

How do we compute the output of a hidden layer node?

To compute the output of a hidden layer node, we compute a weighted sum of the inputs and apply a certain function to it. More formally, for a set of input values (x1, x2, . . . , xp,) we compute the output of node j by taking the weighted sum: θj +∑ wij*xi where θj, w1j , . . . , wpj are weights that are initially set randomly, then adjusted as the network "learns." In the next step, we take a function g of this sum. (PAGE 275)

Consider a neural network with a single output node (prediction) and no hidden layers. For a dataset with p predictors, the output node receives x1, x2, . . . , xp, takes a weighted sum of these, and applies the g function. If g is the identify function [g(s) = s], what is the output equivalent too?

To the formulation of multiple linear regression! (PAGE 277)

What is difference between how least squares and maximum likelihood use the errors from the model compared to how neural network uses them?

Unlike least squares or maximum likelihood, where a global function of the errors (e.g., sum of squared errors) is used for estimating the coefficients, in neural networks, the estimation process uses the errors iteratively to update the estimated weights.

What do you do after you obtain the propensity outputs from the output layer? (Classification context)

You normalize the two values (or however many classes you are predicting) so that they add up to 1

When updating the weights, what does "l" stand for in our equation?

l refers to the learning rate, which is a constant ranging typically between 0 and 1, which controls the amount of change in weights from one iteration to the next

What are the key disadvantages of neural networks?

•Considered a "black box" prediction machine, with no insight into relationships between predictors and outcome •No variable-selection mechanism, so you have to exercise care in selecting variables •Heavy computational requirements if there are many variables (additional variables dramatically increase the number of weights to calculate)


संबंधित स्टडी सेट्स

Chapter 17 - Process Cost Systems

View Set

Unit 15: The Real Estate Market and Analysis

View Set

Chapter 16 and 19 Mastering Biology

View Set

ATI Nurse Logic: Priority Setting Frameworks

View Set

Precalculus Honors Chapter 4 (No Graphs)

View Set