Deep Learning Illustrated II. 6.

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The Neuron Type used simply defines the .... ?

... the Activation Function to be used to calculate the Output from the Inputs. The most widely used Neuron Types (i.e. activation functions): 1. Sigmoid : curved like an S where 0 < a < 1 2. Tranh : like Sigmoid Curve but ranging from -1 < a < 1 3. ReLU : 0 when z<0 else linear > 0 (F(a)=max(0,z))

Which main Neuron Types (aka Activation Functions) exist?

1. Sigmoid : curved like an S where 0 < a < 1 2. Tranh : like Sigmoid Curve but ranging from -1 < a < 1 3. ReLU : 0 when z<0 else linear > 0 (F(a)=max(0,z))

What is the "Tranh Neuron"? How does it differ from the SIGMOID? (Page 224)

1. The TRANH NEURON (pronounced "tanch"), looks similar in shape to the Sigmoid (see image), and is defined by σ (z) = [(e^z - e^-z)] / [(e^z + e^-z)] 2. Main DIFFERENCE to sigmoid function is that tanh neuron's output has the range [-1 : 1] (whereas sigmoid outputs in the range [0 : 1] ) 3. This difference is more than cosmetic. With negative z inputs corresponding to negative a activations, z = 0 corresponding to a = 0, and positive z corresponding to positive a activations, the output from tanh neurons tends to be centered near 0 (whereas sigmoid tends to centre around 0.5 ) 4. These 0-centered a outputs usually serve as the inputs x to other artificial neurons in a network, and such 0-centered inputs make (the dreaded!) neuron saturation less likely, thereby enabling the entire network to learn more efficiently.

Why are Perceptrons are not useful for Deep Learning ? ..., and what other types of neurons would be better suited? (Page 219).

1. The most obvious restriction of the perceptron is that it receives only binary inputs, and provides only a binary output. In many cases, we'd like to make predictions from inputs that are continuous variables 2. Introducing Z: w*x +b = z LEARNING is challenging since a slight adjustment to w or b will cause z to cross from negative to positive (or vice versa), leading to a whopping, drastic swing in output from 0 all the way to 1 (or vice versa). Essentially, the perceptron has no finesse—it's either yelling or it's silent. Fine tuning a model becomes very hard like this (see image) (Page 221). 3. better suited to avoid saturation and allow for more "finesse" would be a Sigmoid, Tranh or ReLU Neurons.

Rank the basic Neuron Types (activation functions) by usefulness (from least to most practical neuron type)

1. The perceptron, with its binary inputs and the aggressive step of its binary output, is not a practical consideration for deep learning models. 2. The sigmoid neuron is an acceptable option, but it tends to lead to neural networks that train less rapidly than those composed of, say, tanh or ReLU neurons. Thus, we recommend limiting your use of sigmoid neurons to situations where it would be helpful to have a neuron provide output within the range of [0, 1]. 3. The tanh neuron is a solid choice. As we covered earlier, the 0-centered output helps deep learning networks learn rapidly. 4. Our preferred neuron is the ReLU because of how efficiently these neurons enable learning algorithms to perform computations. In our experience they tend to lead to well-calibrated artificial neural networks in the shortest period of training time. In addition to the neurons covered in this chapter, there is a veritable zoo of activation functions available (Page 228). keras.io/layers/advanced-activations

What is a Neural Network "Activation Function" ? (Page 222)

A function (for example, Sigmoid, Tranh or ReLU) that takes in the weighted sum of all of the inputs from the previous layer and then generates and passes an output value (typically nonlinear to cover all possible scenarios) to the next layer. SHORT: A function that takes all inputs and calculates and output for a neural network neuron. INPUT: w*x+b = z OUTPUT: a F(z) = a The output from any given neuron's activation function is referred to simply as its ACTIVATION, and throughout this book, we use the variable term "a" (see image)

What is Neuron Saturation ? (Page 224)

Each Neuron type has value ranges where subtle updates to the weights and biases during training will have little to no effect on the output, and thus learning will stall. This situation is called neuron saturation and can occur with most activation functions. Thankfully, there are tricks to avoid saturation, as you'll see in Chapter 9. Perceptron Neuron: Everywhere except right at 0/1 threshold Sigmoid Neuron: near 0 and 1 (best around 0.5) Tranh Neuron: near -1 and 1 (best around 0) ReLu Neuron: near 0 (best for all positive values)

Based on biological Neurons Frank Rosenblatt conceptualized the...

Perceptron: 1. Receive input from multiple other neurons 2. Aggregate those inputs via a simple arithmetic operation called the weighted sum 3. Generate an output if this weighted sum crosses a threshold level, which can then be sent on to many other neurons within a network (Page 209)

What is a "Sigmoid Neuron" ? (Page 221)

SIGMOID NEURONS provide an alternative to the erratic behavior of the perceptron: a gentle curve from 0 to 1. This particular curve shape is called the sigmoid function and is defined by: (see image) σ (z) = 1 / (1+e^(-z)), where z is equivalent to w · x + b. e is the mathematical constant beginning in 2.718. It is perhaps best known for its starring role in the natural exponential function. σ is the Greek letter sigma, the root word for "sigmoid." The sigmoid function is our first example of an artificial neuron activation function. It was used in our "Shallow Neural Network in Keras" example.

What is the "ReLu Neuron" ? (Page 226)

The activation function shape of a Rectified Linear Unit, or ReLU neuron, diverges glaringly from the sigmoid and tanh sorts, was inspired by properties of biological neurons. The action potentials of biological neurons have only a "positive" firing mode; they have no "negative" firing mode. a = max (0, z) (a=z when above 0 and else 0) The ReLU function is one of the simplest functions to imagine that is nonlinear. That is, like the sigmoid and tanh functions, its output a does not vary uniformly linearly across all values of z. The ReLU is in essence two distinct linear functions combined

Why is it so important that Neurons are non-linear?

The non-linear nature is a critical property of all activation functions used within deep learning architectures. These non-linearities permit deep learning models to approximate any continuous function. This universal ability to approximate some output y given some input x is one of the hallmarks of deep learning—the characteristic that makes the approach so effective across such a breadth of applications. This is demonstrated via a series of captivating interactive applets in Chapter 4 of Michael Nielsen's Neural Networks and Deep Learning e-book: http://neuralnetworksanddeeplearning.com/chap4.html In other words. If we would use linear neurons (e.g.: a = z) then we would not be able to replicate / learn & predict some common problems.

Among Sigmoid, Tranh, and ReLU Neurons, which is the most widely used and why?

The relatively simple shape of the ReLU function's particular brand of nonlinearity works to its advantage. As you'll see in Chapter 8, learning appropriate Parameter values for w and b (the Parameters) within deep learning networks involves partial derivative calculus, and these calculus operations are more computationally efficient on the linear portions of the ReLU function relative to its efficiency on the curves of, say, the sigmoid and tanh functions. As a testament to its utility, the incorporation of ReLU neurons into AlexNet (Figure 1.17) was one of the factors behind its trampling of existing machine vision benchmarks in 2012 and shepherding in the era of deep learning. Today, ReLU units are the most widely used neuron within the hidden layers of deep artificial neural networks, (Page 227).

Describe how a biological Neuron works

This is a BIO NEURON 1. Receive information from many other neurons 2. Aggregate this information via changes in cell voltage at the cell body 3. Transmit a signal if the cell voltage crosses a threshold level (aka BIAS in AI), a signal that can be received by many other neurons in the network (Page 209).

Why is it important to use Float Values (0....1) instead of Integers (...-2, -1, 0 1,2,....) for a model's parameters and inputs?

To keep the arithmetic as undemanding as possible, all of the parameter values — the model's weights as well as its bias — should be Float Values (ranging from 0 to 1) rather than integers. FLOAT VALUES are used as mathematics is simpler, it makes them comparable and it is less clunky.

Describe a Perceptron that identifies a HOT DOG (or not) looking at the presence of "inputs" Mustard, Ketchup and Buns.

To make a prediction as to whether the object is a hot dog or not, the perceptron independently WEIGHTS each of these three inputs. Let's determine the weighted sum of the inputs: One input at a time (i.e., elementwise), we multiply the input by its weight (here just arbitrarily selected but AI would work this out) and then sum the individual results. So first, let's calculate the weighted inputs: 1. For the ketchup input: 3 × 1 = 3; (ketchup present) 2. For mustard: 2 × 1 = 2; (mustard present) 3. For bun: 6 × 0 = 0; (bun not present) With those three products, we can compute that the weighted sum of the inputs is 5: 3 + 2 + 0. To generalize from this example, the calculation of the weighted sum of inputs is: n ∑ i=1 w(i)*x(i) Maximum Score is 11 and we arbitrarily decided that score>4 is necessary to confirm a Hot Dog (Page 213).

What Parameters describe an artificial Neuron? (Page 216).

Together, a neuron's BIAS "b" and its WEIGHTS "w(i)" constitute all of its parameters. The OUTPUT (Output "a" which determined by the activation function (perceptron binary, sigmoid, tranh, ReLU)) calculated from the INPUTS and their WEIGHTS has to overcome the BIAS "b" in order to confirm the presence of whatever the neuron is built to detect. (in bio neurons we talk of a voltage threshold) With the concept of a neuron's bias now available to us, we arrive at the most widely used perceptron equation: (see image)

The Most Important Equation in This Book : To achieve the formulation of a simplified and universal perceptron equation, we must introduce a term called...... (Page 216).

the BIAS, which we annotate as "b" and which is equivalent to the negative of an artificial neuron's (Perceptron's) threshold value: b ≡ −threshold Together, a neuron's bias and its weights constitute all of its parameters: the changeable variables that prescribe what the neuron does. (Page 216).

What is the "General Equation for Artificial Neurons" that we will return to time and again. It is the most important equation in this book. (Page 218).

weight (i) x input + bias > 0 [SUM (Weight of Input)] + bias has to be larger than zero to confirm the presence of what we try to detect. NOTICE: To keep the arithmetic as undemanding as possible in our hot dog-detecting perceptron examples, all of the parameter values we made up—the perceptron's weights as well as its bias—were positive integers. These parameters could, however, be negative values, and, in practice, they would rarely be integers. Instead, PARAMETERS are configured as FLOAT VALUES, which are less clunky. Finally, while all of the parameters in these examples were fabricated by us, they would usually be learned through the training of artificial neurons on data. In Chapter 8, we cover how this training of neuron parameters is accomplished in practice. (Page 218).


Ensembles d'études connexes

Digital MKTG exam 2 - combined quizzes

View Set

L1 : INTRODUCING the READING PROCESS & READING STRATEGIES

View Set

K&R5 第2週5日目<読み方> Kanji and Readings Week 11 (Taylor’s Version)

View Set

Congress: Balancing National Goals and Local Interests

View Set

Project Management Midterm (Chapter 1-6)

View Set

Real Estate Fundamentals - Chapter 9

View Set