Module 7
does it matter too much if we have large or small number of hidden neurons?"yes".
"yes". The larger the number of hidden neurons the more the connections and thus the more capable of the network is in learning. If one has a simple problem but using a neuron network with a large number of hidden neurons, then the network will overlearn, which means that the network start to associate noise with output in the training process and in turn the trained network remembers too much detail and will not be able to generalize when applied to a new data set. If the network is too simple (too few hidden neurons), the network will only remember the general relationships which are too coarse for the network to be useful in real world.
The training methods
Apparently, the magnitude of error is determined by the combination of weights for a given network. There exists an error surface that can be described as a function of weights. The objective of network training is to find a set of weights that will minimize the error function. In other words, network training can also be considered as a process to find the global minimum on this error surface and its associated weight configuration for the network. There are a number of network learning algorithms for minimizing the network error function such as the conjugate gradient method
What is ANN good for? What are the limitations of ANN?
Despite these advantages, applications of ANN in geography have limitations. First, the provision of a large set of representative samples is often expensive to collect. Second, the heterogeneity of geographic phenomena (over space and time) makes the application domain not as stable as one would hope. Third, a major goal for many geographic analyses is to understand the relationships between the inputs and the outputs. Neural networks cannot provide those relationships in a meaningful form.
Network Operation
How ANN works can be taken in two ways: 1. How it operates and 2. how to use it.
input pattern.
It needs to be made clear that a given set of inputs to the neurons on the input layer are often referred to as an input pattern. Using the soil type mapping as an example, the environmental conditions at a location (i,j) for the set of environmental variables (such as elevation, slope gradient,etc.) constitute an input pattern, referred to as I(i,j), where I(i,j) is a vector containing the environmental conditions at that location. The activation levels from the output neurons (the neurons on the output layer) can be taken as the output pattern indicating how likely each soil type will occur at this location given the set of input environmental conditions. Since each soil type is tied to an output neuron, the output pattern is also a vector, referred to as O(i,j), with each element in the vector corresponding to the activation level for the soil type.
The training process
Most network training employs a supervised approach under which the network is presented with a set of input patterns and a set of corresponding desired output patterns (which are the two parts of the training data). For a given number of hidden neurons the training process starts by initializing all weights to small non-zero values. Then, training samples are presented to the network one or a few at a time to produce corresponding results. After that a measure of differences between the network outputs based on the existing weights and the desired outputs is computed. This, from sample inputs to outputs and to computing the differences, is referred to as a forward pass. The differences are then propagated backward to update the weights of the links between different layers. This backward propagation is referred to as the backward pass. It is this backward pass through which weights are updated. This forward pass and backward pass constitute one iteration. For a given set of samples it may take several iterations to completely go through. For example, if you have 100 samples and each iteration takes 5 samples, you will then have 20 iterations. A complete pass of all samples through the training process is referred to as an epoch. Many epochs may be required before a network reaches a given level of accuracy.
By training an ANN, what do we try to accomplish?
Network training (learning) in most cases is used to determine a set of weights that will produce the best possible input/output mapping. Under these circumstances the number of hidden neurons is given (fixed) so that network structure is known beforehand. In some cases, people use training also to determine the size of the network (Zhu, 2000). Under the latter circumstances the practioners will have to determine the number of hidden neurons and the weights once the structure is determined. In this discussion of network training we will try to determine both the structure and the weight for a given structure.
Training data
Neuron network training is actually a process of model building in which we develop the association between a set of inputs (an input pattern) and a set of outputs (an output pattern) though samples (training samples). In other words, most network training employs a supervised approach in which the network is presented with input patterns and corresponding and desired output patterns and the training is to develop an association between the inputs and the outputs. The data containing these inputs as well as the expected corresponding outputs is referred to as training data. Thus, for the training data, we know what the inputs are and what the outputs should be. the training samples, which are used to develop the association between input patterns and the output patterns
Network Size
Refers to number of inputs and number of outputs and number of hidden layers and number of neurons on each hidden layer. In other words, network size actually is the network structure. The reason that I did not use the term "structure" is that the number of inputs and the number of outputs are often fixed for a given problem. The number of inputs and the number of outputs are fixed. What are not set are the number of hidden layers and the number of neurons on each hidden layer. Researchers have suggested that for vast majority of applications one-hidden layer is sufficient (Masters, 1993). Thus, the network size for a given application really means the number of hidden neurons on the hidden layer.
multiplayer feedforward network
Refers to the numbers of layers and the number of neurons in each layer and is problem specific. The number of input neurons should be equal to the number of inputs and the number of output neurons should be equal to the number of attributes that define the output. ex: If there are four environmental variables (environmental covariates), there will be four input neurons on the input layer. The number of hidden neurons depends on the complexity of the problem. For most problems, a three-layer model should be sufficient unless the problem domain is highly discontinuous.
What should you be worrying about in applying ANN?
The first issue is the determination of network structure: the number of hidden neurons in the hidden layer. -Learning capacities increase with increasing numbers of hidden neurons. But larger networks require more training samples and they are much more readily over-fitted, at which time the trained network describes random error instead of the underlying relationship. -the size is too small (too few hidden neurons), the network is too simple and an under-fitting situation is created in which the trained network cannot model the underlying complicated relationships. A second issue is based on the quality of the samples used for training and validation. High-quality samples are representative of the population. A biased sample set, particularly biased training samples, would lead to a biased network. Both sample sets should be sufficient in size. Otherwise, the network becomes undertrained. The third issue concerns the applicability and portability of trained neural networks. Trained neural networks, very much like regression models, are only applicable in the domain they are trained, and they cannot be generalized to other domains. -extremely rare to port a neural network trained in one area to another area unless the user is sure that the underlying relationships between the two areas are the same and the training samples used are also representative of the area the network is to be ported to. A fourth issue is the interpretability of the relationships discovered by the trained network. Unlike regression models and decision trees, the relationship, or mapping, between the input patterns and the output patterns extracted by neural networks is in the form of a weight configuration. These weights have no physical meaning in the application domain. They are simply used to match the input patterns with the expected output patterns during training.
What is the basic idea behind Artificial Neural Network method?
The network consists of node (neurons) and links (nerves). The nodes are organized in layers and connected by the nerves (the links). A multiplayer feedforward network is made of many processing elements (neurons). These neurons are usually arranged in layers: an input layer, an output layer, and one or more layers in between called hidden layers. The neurons in one layer are connected to the neurons in the next layer with different strengths of connection, which are referred to as weights. Links support information flow (signals) with different levels of strength, as controlled by the weight. The term feedforward means that the data flow is from inputs to outputs and there is no looping back.
What are the basic structural elements in an ANN?: Neurons
The neurons in the input layer are used only to receive external inputs, but the neurons on the hidden layer and on the output layer have information processing capabilities (Figure 2). The neurons on the hidden layer are also referred to as the hidden neurons. The processing neurons first perform a weighted sum of the inputs from different neurons in the previous layer and transfer this sum into the range [-1,1] through a transfer function of sigmoidal shape.
There are basic two unknowns: the number of hidden neurons and the weights, how would one determine the appropriate settings for these two are achieved during the training process?
The number of samples needed for training a neural network is typically very high. As we all know, it is expensive to collect samples, particularly geographic samples. People often try to get a sense about how many samples are sufficient. Maters (1993) suggested two numbers: one is the minimum and the other reasonable. The minimum number of training samples (training cases) is 2(m+1)n, where m and n are the number of inputs and the number of outputs neurons, respectively. The reasonable number of training samples is 4(m+1)n.
1. How it operates
The operation of a feedforward neural network is rather simple. The inputs from the neurons in the input layer are respectively fed to each hidden neuron in the hidden layer through the weighted sum approach. Each hidden neuron then transfers the sum into a range [-1,1] as the output of this hidden neuron. This output is then used as the input to each of the output neurons on the output layer. The final value from the output neuron is referred to as the activation level.
Determining structure and weight configuration through training
The process starts with a network with a small number of hidden neurons, which forms a network (given that the inputs and the outputs are given and only one hidden layer). We then first initialize its weights using the simulated annealing approach. After initialization, the conjugate gradient algorithm was used to locate a nearest minimum (Figure 6). Once a minimum was found, simulated annealing was used to attempt to break out of what might be a local minimum. If annealing succeeded in escaping from the local minimum, which means it is successful in reducing the error, the conjugate gradient method was used again to find a minimum in this new region of the error surface. This alternation between conjugate gradient minimization and annealing escaping continued until several iterations in a row produced only trivial improvement or no improvement at all and the minimum was then marked to be a candidate for the global minimum. The network was then reinitialized with an entirely new set of starting weights to look for another candidate. This process can be repeated for a prescribed number of times (user specified) (Zhu, 2000) with each producing a candidate for the global minimum. All of these candidates were then compared and the best was chosen to be the global minimum, that is the best weight configuration for the given network. We then increase the number of hidden neurons to form another neural network and then repeat the entire process above to find the best set of weights for this new network. We continue this process up to the number of hidden neurons we think sufficient for our application (Tricky to determine this, but often use a very large number). The process gives us a set of networks with their respective optimal weight configurations. The next step is to select the best network for the application. This is done first by computing the test accuracy for each network. This can be accomplished by feeding the test samples to the trained network and compute the accuracy between the outputs from the network and the observed in the test sample set. In addition, there is also a training error reported for each network through the training process. We can plot these two against the number of hidden neurons such as the plot in Figure 7. The network with the high test accuracy and low training error would be the one for the application. In Figure 7, the network with 6 or 10 hidden neurons would be chosen. Since the chosen network already has weight configuration determined, this process would lead to the determination of network structure as well as the weight configuration for the network.
test data
The test data also contains both the inputs and the outputs just as the training samples do (as shown in Table 1). The function of the test data is the same as that of the validation data we discussed in the lesson where we discussed prediction validation and that it is to assess the accuracy of the prediction from the trained neural network. That is, once the network is trained using the training samples, the network will be evaluated using the test data so that the evaluation is independent of the training. The evaluation is done by feeding the trained network with the inputs of each test sample and comparing the output (prediction) from the network with that observed for that sample. the validation samples, which are used to examine the accuracy of the trained network.
how many hidden neurons should there be?
a general approach to determine the number of hidden neurons for situations when the complexity is somewhat related to the number of inputs and the number of outputs. In this case, the number of hidden neurons can be estimated using Equation 1 below. Nh=m*n−−−−−√Nh=m*n Where Nh is the number of hidden neurons, and m and n are the number of inputs and the number of outputs, respectively.
The error surface of weights for a neural network
can contain a large number of local minima since a large number of weight permutations will produce similar input/output mappings. These local minima can make network training even more complicated. Avoiding false (local) minima consists of two steps. The first is to avoid initiating weights in the vicinity of local minima and the second is to determine if a found minimum is local. If it is, try to escape from this local minimum.
application of ANN
consists of two general steps: network training and network application. Network training uses samples, each of which contains an input pattern and the expected output pattern for that sample, to develop associations between a set of input patterns and a set of output patterns. The network application step uses the trained network to predict an output pattern for a given input pattern whose corresponding output pattern is unknown.
The conjugate gradient method
improves upon the conventional back propagation method by using adaptive approaches to the determination of momentum (η) and learning rate (μ), which control the convergence and the speed of network training The momentum controls the direction to search for a minimum, that is, which weights to be updated. The learning rate determines the rate at which the weights should be updated during each iteration. The conjugate gradient method first employs the Polak-Ribiere algorithm to determine the best direction to search for the minimum. It then uses a directional minimizing process to determine the location of the minimum
2. How to use it
network size, network training and network application.
the output from an output neuron depends on two elements given that the structure of the ANN is fixed
the inputs and the weights assigned to the links (nerves). The inputs are often set for a given application. So the value of the output depends on the configuration of the weights. One might ask how to determine the weights. The weights are set by a processing referred as Network Training (Learning). The key (core) part of the applying ANN is network training which is discussed in the next section.
Under what circumstances is the neural network approach most suited?
the learned relationship is in the form of a network weight configuration that is not meaningful in the application domain. Neural network training often requires a large number of samples and any learned relationship is not transferable to areas (domains) in which the training samples are not representative. Given these properties, ANN is effective for applications in which the provision of samples is not expensive and the problem domain is fixed.
