Deep learning
What is deep learning?
- an area of machine learning, focuses on deep artificial neural networks which are loosely inspired by brains. - Application: computer vision, speech recognition, natural language processing. Deep learning is a class of machine learning algorithms that:[10](pp199-200) - use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners. learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.
What is a cost function?
A cost function is a measure of the accuracy of the neural network with respect to given training sample and expected output. It is a single value, nonvector as it gives the performance of the neural network as a whole. It can be calculated as below Mean Squared Error function:- MSE=1n∑i=0n(Y^i-Yi)^2 Where Y^ and desired value Y is what we want to minimize.
What Is A Model Capacity?
Ability to approximate any given function. The higher model capacity is the larger amount of information that can be stored in the network.
What is an auto-encoder?
An autoencoder is an autonomous Machine learning algorithm that uses backpropagation principle, where the target values are set to be equal to the inputs provided. Internally, it has a hidden layer that describes a code used to represent the input. Some Key Facts about the autoencoder are as follows:- •It is an unsupervised ML algorithm similar to Principal Component Analysis •It minimizes the same objective function as Principal Component Analysis •It is a neural network •The neural network's target output is its input
What are the benefits of mini-batch gradient descent?
Benefits: •This is more efficient compared to stochastic gradient descent. •The generalization by finding the flat minima. •Mini-batches allows help to approximate the gradient of entire training set which helps us to avoid local minima.
What is a Boltzmann Machine?
Boltzmann Machine is used to optimize the solution of a problem. The work of Boltzmann machine is basically to optimize the weights and the quantity for the given problem. Some important points about Boltzmann Machine − •It uses recurrent structure. •It consists of stochastic neurons, which consist one of the two possible states, either 1 or 0. •The neurons in this are either in adaptive (free state) or clamped (frozen state). •If we apply simulated annealing on discrete Hopfield network, then it would become Boltzmann Machine.
What is data normalization and why do we need it?
Data normalization is used during backpropagation. The main motive behind data normalization is to reduce or eliminate data redundancy. Here we rescale values to fit into a specific range to achieve better convergence.
What Is A Dropout?
Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of these models share weights. It's a form of model averaging.
Why are deep networks better than shallow ones?
There are studies which say that both shallow and deep networks can fit at any function, but as deep networks have several hidden layers often of different types so they are able to build or extract better features than shallow models with fewer parameters.
What is weight initialization in neural networks?
Weight initialization is one of the very important steps. A bad weight initialization can prevent a network from learning but good weight initialization helps in giving a quicker convergence and a better overall error. Biases can be generally initialized to zero. The rule for setting the weights is to be close to zero without being too small.
Is it OK to connect from a Layer 4 output back to a Layer 2 input?
Yes, this can be done considering that layer 4 output is from previous time step like in RNN. Also, we need to assume that previous input batch is sometimes- correlated with current batch.
What is a gradient descent?
Gradient descent is basically an optimization algorithm, which is used to learn the value of parameters that minimizes the cost function. - Optimization is : Finding an alternative with the most cost effective or highest achievable performance under the given constraints, by maximizing desired factors and minimizing undesired ones. In comparison, maximization means trying to attain the highest or maximum result or outcome without regard to cost or expense. - Optimization is the most essential ingredient in the recipe of machine learning algorithms. It starts with defining some kind of loss function/cost function and ends with minimizing the it using one or the other optimization routine It is an iterative algorithm which moves in the direction of steepest descent as defined by the negative of the gradient. We compute the gradient descent of the cost function for given parameter and update the parameter by the below formula:- Θ:=Θ-αd∂ΘJ(Θ) Where Θ - is the parameter vector, α - learning rate, J(Θ) - is a cost function.
What Are Hyperparameters, Provide Some Examples?
Hyperparameters as opposed to model parameters can't be learn from the data, they are set before training phase. Learning rate: It determines how fast we want to update the weights during optimization, if learning rate is too small, gradient descent can be slow to find the minimum and if it's too large gradient descent may not converge(it can overshoot the minima). It's considered to be the most important hyperparameter. Number of epochs: Epoch is defined as one forward pass and one backward pass of all training data. Batch size: The number of training examples in one forward/backward pass.
Explain the following three variants of gradient descent: batch, stochastic and mini-batch?
Stochastic Gradient Descent: Here we use only single training example for calculation of gradient and update parameters. Batch Gradient Descent: Here we calculate the gradient for the whole dataset and perform the update at each iteration. Mini-batch Gradient Descent: It's one of the most popular optimization algorithms. It's a variant of Stochastic Gradient Descent and here instead of single training example, mini-batch of samples is used.
What is a backpropagation?
- an important tool for improving the accuracy of predictions in data mining and machine learning. - = "the backward propagation of errors," since an error is computed at the output and distributed backwards throughout the network's layers. - It is commonly used to train deep neural networks. - neural networks use backpropagation as a learning algorithm to compute a gradient descent that is needed in the calculation of the weights of neurons to be used in the network, by calculating the gradient of the loss function. - In this method, we move the error from an end of the network to all weights inside the network and thus allowing efficient computation of the gradient. It can be divided into several steps as follows:- >>> Forward propagation of training data in order to generate output. - Then using target value and output value error derivative can be computed with respect to output activation. - Then we back propagate for computing derivative of error with respect to output activation on previous and continue this for all the hidden layers. - Using previously calculated derivatives for output and all hidden layers we calculate error derivatives with respect to weights. - And then we update the weights.
What Is An Autoencoder?
-Autoencoder is artificial neural networks able to learn representation for a set of data (encoding), without any supervision. - The network learns by copying its input to the output, typically internal representation has smaller dimensions than input vector so that they can learn efficient ways of representing data. -Autoencoder consist of two parts, an encoder tries to fit the inputs to an internal representation and decoder converts internal state to the outputs.
What is the role of the activation function?
The activation function is used to introduce non-linearity into the neural network helping it to learn more complex function. Without which the neural network would be only able to learn linear function which is a linear combination of its input data.