Documentation / Methodological guide :

IV.4. The artificial neural network

IV.4. The artificial neural network
	Chapter IV. Generating surrogate models

IV.4. The artificial neural network

The Artificial Neural Networks (ANN) in Uranie are Multi Layer Perceptron (MLP) with one or more hidden layer (containing neurons, where is use to identify the layer) and one or more output variable.

IV.4.1. Introduction to the formal neuron

The concept of formal neuron, has been proposed in 1943, after observing the way biological neurons are intrinsically connected [McCulloch1943]. This model is a simplification of the various range of functions dealt by a biological neuron, the formal one (displayed in Figure IV.6) being requested to satisfy only the two following:

summing the weighted input values, leading to an output value called neuron's activity , where are the synaptic weights of the neuron.
emitting a signal (whether the output level goes beyond a chosen threshold or not) where and are respectively the transfer function and the bias of the neuron.

Figure IV.6. Schematic description of a formal neuron, as seen in McCulloch and Pitts [McCulloch1943].

One can introduce a shadow input defined as (or -1), which lead to consider the bias as another synaptic weight . The resulting emitted signal is written as

There are a large variety of transfer functions possible, and an usual starting point is the sigmoid family, defined with three real parameters, c, r and k, as . Setting these parameters to peculiar values leads to known functions as the hyperbolic tangent and the logistical function, shown in Figure IV.7 and defined as

Figure IV.7. Example of transfer functions: the hyperbolic tangent (left) and the logistical one (right)

IV.4.2. The working principle

The artificial neural network conception and working flow has been first proposed in 1962 [rosenblatt196] and was called the perceptron. The architecture of a neural network is the description of the organisation of the formal neurons and the way they are connected together. There are two main topologies:

complete: all the neurons are connected to the others.
by layer: the neurons on a layer are connected to all neurons on the previous layer and the following one.

The general organisation of the neural network is detailed in three steps in the following part, and summarised in Figure IV.8. The first layer, where the vector of entries is stored, is called the input layer. The last one is called the output layer. In between, one can put from zero (leading to a Rosenblatt perceptron) to as many hidden layers as wanted. With one or more hidden layers, the neural networks are called Multi Layer Perceptron (MLP) and have been studied heavily since early 1990 [Hornik1989,Barron93]. In Uranie, the architecture has to be chosen bu the user (as least one hidden layer). layer.

Figure IV.8. Schematic description of the working flow of an artificial neural network as used in Uranie

The first step is the definition of the problem: what are the input variables under study, how many neurons will be created in how many hidden layers, what is the chosen activation function. When choosing the architecture of an artificial neural network, one should keep in mind that the number of points used to perform the training should obviously be greater than the number of parameters to be estimated, i.e. the number of synaptic weights (the usual rule of thumb being having a factor 1.5 between data points and the number of coefficients). From the explanation given previously, the number of coefficients for a single hidden layer neural network is where is the number of neurons chosen. This formula can be generalised to multi-layer cases, as

In this equation, is the number of hidden neurons per layer, for where is the number of layer, and is the number of inputs (). Defining an architecture is quite tricky and depends on the problem under consideration. Going multi-layer is a way to reduce the number of coefficients to be estimated when the number of inputs is large: with 9 inputs variables, a mono-layer with 8 neurons will have 80 synaptic weights while a network with 3 layers and 3 neurons each, will have 9 neurons in total but only 54 weights to be determined.

The second step is the training of the ANN. This step is crucial and many different techniques exist to achieve it but, as this note is not supposed to be exhaustive, only the one considered in the Uranie implementation will be discussed. A learning database should be provided, composed of a set of inputs and the resulting output (the ensemble discussed in Section IV.1). From that, two mechanisms are run simultaneously:

the learning itself. By varying all the synaptic weights contained in the parameter , the aim is to produce the output set , that would be as close as possible to the output stored in and keep the best configuration (denoted as ). The difference between the real outputs and the estimated ones are measured through a loss function which could be, in the case of regression, a quadratic loss function such as
From there, one can define the risk function (also called cost or energy function) used to transform the optimal parameters search into a minimisation problem. The empirical risk function can indeed be written as
the regularisation. This step is made to avoid all over-fitting problem, meaning that the neural network would be trained only for the ensemble which might not be representative of the rest of the input space. To avoid this, the learning database is split into two sub-parts: one for the learning as described in the previous item, and one to prevent the over-fitting to happen. This is done by computing for every newly tested parameter set , the generalised error (computed as the average error over the set of points not used in the learning procedure). While it is expected that the risk function is becoming smaller when the number of optimisation step is getting higher, the generalised error is also becoming smaller at first, but then it should stabilise and even get worse. This flattening or worsening is used to stop the optimisation.

This procedure is stochastic: the splitting of the

ensemble is done using a random generator, so does the initialisation of the synaptic weights for all the formal neurons.

Finally, the constructed neural network can be (and should be) exported: the weight initialisation, but also the way the split is performed between the test and training basis, are randomised leading to different results every time one redo the training procedure.


IV.3. Chaos polynomial expansion		IV.5. The kriging method