DD2437 - ANN

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Artificial neural network architecture

A network graph parameterised by the number of layers, units and often by transfer function

Neighbourhood function in SOM

This function controls how the neighbours of the winning unit in a SOM grid are updated during a learning process

Recurrent vs feed‐forward networks

Connectivity patterns with loops vs uni‐directional layered architecture

Shallow vs deep learning

Learning using only a few layers vs using many

Bias‐variance dilemma

Problem with simultaneous minimization of two sources of error preventing from good generalization capabilities

Bayesian Regularization

The founding principle for Bayesian regularization is that weights are represented as probability distributions. Key practical advantages are ‐ more effective use of training data - no need for extra validation subset ‐ the output is provided as a probability distribution ‐ unified framework for holistic model selection ‐ regularization parameters are adjusted in automated fashion ‐ relevance of data features is scored

Early stopping

A form of regularization to avoid overfitting when training a neural network by controlling the number of learning epochs based on the behaviour of the validation error

Discriminative vs generative classifier

A generative classifier models how the data was generated (joint probability over the input and output/label space) while a discriminative approach only deals with predictions that can be made based on the learned conditional probability distribution.

Simulated annealing

A probabilistic algorithm for finding global minima in the presence of local optima; it is used among others to support the stochastic dynamics of Boltzmann machines

Greedy layer-wise pretraining

An iterative process of training and building layer by layer a deep network

Hopefield Networks

Assumptions: A weight matrix in Hopfield networks should be symmetric and should have 0s in the diagonal. Learning rule: Hebbian learning is a classical approach to learning Hopfield networks. Characteristics of input patterns which promote large storage capacity in Hopefield networks are: Sparseness and orthogonality promote large storage capacity in Hopfield networks.

Ensemble Techniques

Averaging helps in reducing the variance in the statistical learning sense (one could say less dependency on the selection of the training data). Ensemble techniques often train different models on different subsets of data. RandomForest Baging Boosting Mixture of experts (Look up) A committee of networks produces models with greater generalization capability due to the improved bias-variance balance (for this reason, we tend to choose individual networks with low bias as the variance is taken care of by the ensemble). Interpretability can be enhanced for mixture of experts where the input decides about the selection of networks to determine the output.

Recurrent vs feed-forward networks

Connectivity patterns with loops vs uni-directional layered architecture

Learning algorithm

Data driven algorithm that manipulates network's weights to fit data

RBF network vs multi‐layer perceptron (MLP)

Differences in activation functions, architecture, weight interpretation, learning algorithm, sparseness of the hidden layer activations etc.

Activation function

Function that defines the mapping between the input and output for individual units (determines their fundamental behaviour)

Spurious attractor

In the context of Hopfield networks, it corresponds to a local energy minimum (fixed point) created unintentionally during training, besides the intended target patterns

Transfer learning

In the context of neural networks (though it is broader in machine learning) it describes a capability of (deep) networks to use knowledge gained while solving one problem to address a different problem (somehow related problem so that the knowledge can be shared)

Supervised vs unsupervised learning

Learning with vs without labels

Autoassociative memory

Networks storing memory patterns that can be retrieved by stimulus the same as the stored pattern

Vanishing vs exploding gradients

One of the biggest problems with backprop training - gradients either vanish with propagating error or grow

Overfitting

Overfitting is an undesirable effect due to poor generalisation, off bias-variance balance. Overfitting usually manifests itself by large test error (unseen data), especially when compared to training error. Techniques against overfitting is - Early stopping - Regularization, e.g. weight decay - Model selection with generalization error estimate obtained using cross-validation - Network growing or pruning - Dropout, randomly selecting weights used for determining if a neuron will fire.

Overfitting

Poor generalization, i.e. lack of capability to perform a classification or regression task well on unseen data; inadequate fit to data where accounting for noise manifests itself in high variance of the estimator

Hopfield net vs Boltzmann machine

Similar architecture, different learning rules, stochastic nature of Boltzmann machines

Self-Organizing Maps

Students should describe the concept of iterative process of updating the input weight vectors with consideration to shrinking neighborhood determining which other nodes besides best matching units should get updated (competition with collaboration). Students should explain the concept and motivation for shrinking neighborhood (explorative and fine-tuning/convergence phase), often accompanied by decreasing the learning rate. topographic mapping - preserving similarity relationship when mapping from input (similarity of input vectors) to output space (proximity of the corresponding best matching units in the output grid); possible due to neighborhood. Kohonen maps useful for clustering or visualization

Synchronous vs asynchronous update in Hopfield nets

Synchronous update implies that the states of all units in a Hopfield net are changed/updated simultaneously in one step whereas asynchronous update involves only a single unit update per step

Undercomplete vs overcomplete autoencoders

The difference is in the size of the hidden layer and hence learning; overcomplete autoencoders have a hidden layer larger than the dimensionality of the input and their training is often supported by regularization

Sequential (on‐line) vs batch learning

The difference lies in how and when the weights are updated in the learning process: either following the error and gradient calculations for each sample (sequential) or collectively using the accumulated update once every epoch (batch)

Backprop

The most common learning algorithm for multi-layer perceptrons; the essential step consists in propagating error from output to input and iteratively adjusting weights

Dead units in competitive learning

Undesirable units that never get updated in the learning process with a given dataset

Popular Neural Network Techniques

· Stacked autoencoders often built by greedy training of component autoencoders with backprop. · Deep belief networks based on stacked RBMs (trained with contrastive divergence) · Convolutional neural networks trained by backprop (in essence), composed of multiple layers with convolutional layer, pooling layer, and fully‐connected layer as a basic building block.


Ensembles d'études connexes

Prep U CH 48: Management of Patients with Kidney Disorders

View Set

Cannon Post Test Study Guide 1-5

View Set

PH Chp. 21- The Nurse-led Health Center: A Model for Community Nursing Practice

View Set