DD2437 - ANN
Artificial neural network architecture
A network graph parameterised by the number of layers, units and often by transfer function
Neighbourhood function in SOM
This function controls how the neighbours of the winning unit in a SOM grid are updated during a learning process
Recurrent vs feed‐forward networks
Connectivity patterns with loops vs uni‐directional layered architecture
Shallow vs deep learning
Learning using only a few layers vs using many
Bias‐variance dilemma
Problem with simultaneous minimization of two sources of error preventing from good generalization capabilities
Bayesian Regularization
The founding principle for Bayesian regularization is that weights are represented as probability distributions. Key practical advantages are ‐ more effective use of training data - no need for extra validation subset ‐ the output is provided as a probability distribution ‐ unified framework for holistic model selection ‐ regularization parameters are adjusted in automated fashion ‐ relevance of data features is scored
Early stopping
A form of regularization to avoid overfitting when training a neural network by controlling the number of learning epochs based on the behaviour of the validation error
Discriminative vs generative classifier
A generative classifier models how the data was generated (joint probability over the input and output/label space) while a discriminative approach only deals with predictions that can be made based on the learned conditional probability distribution.
Simulated annealing
A probabilistic algorithm for finding global minima in the presence of local optima; it is used among others to support the stochastic dynamics of Boltzmann machines
Greedy layer-wise pretraining
An iterative process of training and building layer by layer a deep network
Hopefield Networks
Assumptions: A weight matrix in Hopfield networks should be symmetric and should have 0s in the diagonal. Learning rule: Hebbian learning is a classical approach to learning Hopfield networks. Characteristics of input patterns which promote large storage capacity in Hopefield networks are: Sparseness and orthogonality promote large storage capacity in Hopfield networks.
Ensemble Techniques
Averaging helps in reducing the variance in the statistical learning sense (one could say less dependency on the selection of the training data). Ensemble techniques often train different models on different subsets of data. RandomForest Baging Boosting Mixture of experts (Look up) A committee of networks produces models with greater generalization capability due to the improved bias-variance balance (for this reason, we tend to choose individual networks with low bias as the variance is taken care of by the ensemble). Interpretability can be enhanced for mixture of experts where the input decides about the selection of networks to determine the output.
Recurrent vs feed-forward networks
Connectivity patterns with loops vs uni-directional layered architecture
Learning algorithm
Data driven algorithm that manipulates network's weights to fit data
RBF network vs multi‐layer perceptron (MLP)
Differences in activation functions, architecture, weight interpretation, learning algorithm, sparseness of the hidden layer activations etc.
Activation function
Function that defines the mapping between the input and output for individual units (determines their fundamental behaviour)
Spurious attractor
In the context of Hopfield networks, it corresponds to a local energy minimum (fixed point) created unintentionally during training, besides the intended target patterns
Transfer learning
In the context of neural networks (though it is broader in machine learning) it describes a capability of (deep) networks to use knowledge gained while solving one problem to address a different problem (somehow related problem so that the knowledge can be shared)
Supervised vs unsupervised learning
Learning with vs without labels
Autoassociative memory
Networks storing memory patterns that can be retrieved by stimulus the same as the stored pattern
Vanishing vs exploding gradients
One of the biggest problems with backprop training - gradients either vanish with propagating error or grow
Overfitting
Overfitting is an undesirable effect due to poor generalisation, off bias-variance balance. Overfitting usually manifests itself by large test error (unseen data), especially when compared to training error. Techniques against overfitting is - Early stopping - Regularization, e.g. weight decay - Model selection with generalization error estimate obtained using cross-validation - Network growing or pruning - Dropout, randomly selecting weights used for determining if a neuron will fire.
Overfitting
Poor generalization, i.e. lack of capability to perform a classification or regression task well on unseen data; inadequate fit to data where accounting for noise manifests itself in high variance of the estimator
Hopfield net vs Boltzmann machine
Similar architecture, different learning rules, stochastic nature of Boltzmann machines
Self-Organizing Maps
Students should describe the concept of iterative process of updating the input weight vectors with consideration to shrinking neighborhood determining which other nodes besides best matching units should get updated (competition with collaboration). Students should explain the concept and motivation for shrinking neighborhood (explorative and fine-tuning/convergence phase), often accompanied by decreasing the learning rate. topographic mapping - preserving similarity relationship when mapping from input (similarity of input vectors) to output space (proximity of the corresponding best matching units in the output grid); possible due to neighborhood. Kohonen maps useful for clustering or visualization
Synchronous vs asynchronous update in Hopfield nets
Synchronous update implies that the states of all units in a Hopfield net are changed/updated simultaneously in one step whereas asynchronous update involves only a single unit update per step
Undercomplete vs overcomplete autoencoders
The difference is in the size of the hidden layer and hence learning; overcomplete autoencoders have a hidden layer larger than the dimensionality of the input and their training is often supported by regularization
Sequential (on‐line) vs batch learning
The difference lies in how and when the weights are updated in the learning process: either following the error and gradient calculations for each sample (sequential) or collectively using the accumulated update once every epoch (batch)
Backprop
The most common learning algorithm for multi-layer perceptrons; the essential step consists in propagating error from output to input and iteratively adjusting weights
Dead units in competitive learning
Undesirable units that never get updated in the learning process with a given dataset
Popular Neural Network Techniques
· Stacked autoencoders often built by greedy training of component autoencoders with backprop. · Deep belief networks based on stacked RBMs (trained with contrastive divergence) · Convolutional neural networks trained by backprop (in essence), composed of multiple layers with convolutional layer, pooling layer, and fully‐connected layer as a basic building block.