How to program a neural network

Posted on 11 Nov 2015

This tutorial is a detailed guide on how to implement a feedforward neural network, including the backpropagation learning algorithm, in Python 3.4. This is one of the many ways to program a neural net, not necessarily the most efficient nor the prettier, but it’s relatively easy to understand and use. For the general reader, a neural network is a simulation of a group of interconnected neurons, similarly to how a brain works. A neural network can be trained to solve potentially any problem. Nevertheless, complex problems require a lot of neurons, which in turn require a huge amount of computing power and time. To give you an idea, running 100 iterations on a 1000 neuron neural network (an ant has aprox. 250000 neurons) in a regular laptop computer takes around 2.5 minutes with the implementation explained in this page. A common design for artificial neural networks is to organise the neurons in layers, and connect all the neurons of each layer to the ones of the two neighbour layers. The first layer contains the input neurons, which receive the input data, and the last layer contains the output neurons, from which the output of the network is collected. The layers in between are called hidden layers. They allow the network to deeper ‘understand’ the processed data. But what does understanding mean?

By understanding we mean that the network is able to sort the input data into groups. From the mathematical point of view, a neural network is a device that can be trained to represent any function, even if that function is unknown, if the right layout and number of neurons is met. The following graphs illustrate how a neural network can be used to identify boundaries within a set of data points with different properties (in this case different color): Neural networks are used nowadays to solve problems like handwritten text recognition, speech-to-text, face detection, etc. With the neural network presented in this page you can already solve some of these problems, whilst writing a conventional algorithm could take years and thousands of lines of code to accomplish similar results (the best hand-written digit detectors are NN-based and can reach accuracies of over 98% ). In this guide I will be giving examples to better visualise  and understand the steps, but I will not write deep theory on how NN work. For this reason, I will be pointing to the literature I found useful for each part of the process.

Step 1: Define the general structure of the program

Our implementation will consist of a main class called NeuralNetwork. This will contain all the essential functions to create, train and use a neural network.

The main methods provided by the NeuralNetwork class will be:

• _init_ : Will create the neural network with the specified layers and neurons per layer.
• forwardPropagate: Will compute the network outputs with a given set of inputs.
• backPropagate: Will update the weights of the network in a way that the output error will be reduced, given the target outputs.
• train: Will make the network forward-propagate and backward-propagate through the specified number of epochs, input sets and iterations, making the output error eventually become small.
• plotResults: Will plot the results from the network, interpreting the results and giving an indication for the accuracy.

Aside from this class and its main methods we will also write some functions that will contain code that otherwise would have to be written repetitively. Moreover, we will implement a stopwatch that will give a prediction of the remaining time for a training to finish.

```import numpy as np
import random
import time
import sys

class StopWatch:
def __init__(self, totalNumberOfChecks):
def update(self, epoch):
def stop(self):

class OutputNeuronError:
def __init__(self, primitive, derivative):

class Neuron:
def __init__(self, index):

class NeuralNetwork:
def __init__(self, neurons_per_layer):
def forwardPropagate(self, inputs, layerNum = 0):
def backPropagate(self, inputs, targetOutputs, previousWeightMap = []):
def train(self, inputs, targetOutputs, iterations, epochs, verbose = False):
def plotResults(self, inputs, targetOutputs, minErrorForValidation, applyBynaryThreshold = False, exhaustive = False):
```

Step 2: Create the internal network structure

In this step, we need to define how the different data that defines the neural network is stored. This should be designed in a way that makes data handling easy without impacting efficiency negatively. Again, there are many ways to to this, mine is certainly not the best, but is easy to understand and play with.

The layered structure is accomplished by filling a list of lists with instances of the Neuron class: Each neuron object is instantiated with a unique (absolute) index. This will be useful when identifying the weight of a connection between neurons of two different layers. The indexes are assigned starting from first to last neuron within a layer and from first to last layer within the network: The weights are stored in a square matrix, where the row and column indexes are the absolute neuron indexes. Each entry in the matrix is the weight of the connection between the row and column neurons. The weight values range from 0 to 1, where a weight with value 0 means that no connection exists. This kind of representation of a neural network is called a Hinton diagram. The Hinton diagram for the network from the previous picture would be (assuming a weight of 1 for all connections and defining the bias index as ‘b’): Usually, when starting a training session, all non-zero weights are set to random values. This produces different results each time we train the network because we are setting the initial conditions closer or further to a minimum of the error function, or neighbouring a different local minima of this function in different attempts (more on this in Step 4). So there is always a bit of luck involved in the final accomplished accuracy.

Try to implement the init method yourself with the ideas just explained, then expand the following block of code to compare with my own approach (I introduce also some variables that will be used later):

```def __init__(self, neurons_per_layer):  # neurons_per_layer is a list containing the number of neurons per layer.
self.outputErrors = []
self.learningRate = 0.5     # Default value. Can be changed, but this seems to give good results.
self.momentum = 0.8         # Default value. Can be changed, but this seems to give good results.
self.neuronCount = sum(neurons_per_layer)
self.INPUT_LAYER = 0                            # Input layer is always at index 0
self.OUTPUT_LAYER = len(neurons_per_layer) - 1  # Output layer is the last index
self.layers = []    # This is a list of layers. Each layer is a list of neurons.
# The weights are stored in a Hinton diagram (a square matrix where each instance is the weight between the row and column index neurons).
self.weightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]

### Fill the layer list
currentIndex = 1
for neuronsInCurrentLayer in neurons_per_layer:
newLayer = []
for neuron in range(neuronsInCurrentLayer):
newNeuron = Neuron(currentIndex)    # Create a neuron with a unique (absolute) index.
currentIndex += 1
newLayer.append(newNeuron)          # Add the neuron to the new layer
self.layers.append(newLayer)            # Add the new layer to the layer list

### Set random weights between neurons
for i in range(0, len(self.layers)-1):
for neuron1 in self.layers[i]:
for neuron2 in self.layers[i+1]:
### Weights between neurons
self.weightMap[neuron1.index][neuron2.index] = random.random()
### Weights between the bias and each neuron (excluding the input neurons)
self.weightMap[neuron2.index] = random.random()
```

To produce the previous neural network, we would write:

```neuralNetLayout = [4, 3, 2]
neuralNet = NeuralNetwork(neuralNetLayout)
```

Step 3: Write the forward propagation method

The forward propagation function receives the inputs to the network, propagates them from layer to layer using the weight map and returns the outputs.

Let’s analyse what propagation does with the following simple network:

At a first glance we see that the layout of this network is [3, 2, 2] —3 inputs, a single hidden layer with 2 neurons and 2 outputs—, the input vector is (1.0, 0.0, 1.0) and the output vector is (0.798, 0.726). Also notice that the block diagram and the Hinton diagram give the exact same information about the layout and the weights of the connections.

The output vector is what we want our forward propagation function to compute. It is called propagation function because it propagates the input data through the graph of the network until it reaches the output neurons. To accomplish this, it performs two steps on each neuron:

1. Weighted sum of all the outputs from the neurons from the previous layer. Let’s call this weighted sum S. For example, the weighted sum for neuron 4 would be: 2. Compute the output using the logistic function. Let’s call the output O. For example, the output for neuron 4 would be: Also take in mind that the neurons in the input layer are just transmitters of the input data, so the steps described above are not applied to them.

My approach to write the forwardPropagate method:

```def forwardPropagate(self, inputs, layerNum = 0):
### Check if the number of inputs corresponds to the number of input neurons
if len(inputs) != len(self.layers[self.INPUT_LAYER]):
sys.exit("Error: Number of inputs does not correspond with number of input neurons.")

# Set input neurons' outputs as the inputs
for i, neuron in enumerate(self.layers[self.INPUT_LAYER]):
neuron.output = inputs[i]

for layerNum in range(1, len(self.layers)):
for i, currentNeuron in enumerate(self.layers[layerNum]):
### Compute weighted sum of the inputs to the neuron
weightedInputSum = 0
for i, previousNeuron in enumerate(self.layers[layerNum-1]):
weightedInputSum += inputs[i] * self.weightMap[previousNeuron.index][currentNeuron.index]
weightedInputSum += self.weightMap[currentNeuron.index]
### Compute the output using the logistic function
neuronOutput = 1/(1+np.exp(-weightedInputSum))
currentNeuron.output = neuronOutput # Store the current output in the neuron

### Return a list with the outputs from the output layer
return [neuron.output for neuron in self.layers[self.OUTPUT_LAYER]]
```

Step 3.1: Test the network

Now before continuing to step 4, it’s a good practice to test that what we have done so far works. For this reason we will be testing on a small network, like the one shown in step 3 (layout: [3, 2, 2]). We will give it three random inputs and it will return two outputs. Then we will hand-check that the results are correct.

First, we need to implement a method inside the NeuralNetwork class that prints a Hinton diagram:

```def plotWeightMap(self):
print("Hinton diagram:\n")

output = "".ljust(3)
for i in range(self.neuronCount+1):
output += ("b" if i == 0 else str(i)).ljust(7)
print(output)

for i, row in enumerate(self.weightMap):
output = ("b" if i == 0 else str(i)).ljust(3)
### Print weights
for j, weight in enumerate(row):
output += str(round(weight, 4)).ljust(7)  # Print numbers up to 4 decimal places
print(output)
```

Now writing:

```npl = [3, 2, 2]
n = NeuralNetwork(npl)
n.plotWeightMap()
outputs = n.forwardPropagate([1.0, 0.0, 1.0])
print("\nOutputs: " + str(outputs))
```

You should get (with different numbers, as they are random): Finally, calculate yourself the outputs of the network with the weights from the Hinton diagram and check that they correspond to the computed ones. I really recommend this step for debugging what we have done until now, it helped me to find some mistakes in the logic that would otherwise take hours to catch.

Try a bigger network, say [3, 1000, 2] to see if there is any out-of-bounds exception or other issues.

Step 4: Write the backward propagation method

This is where all the black magic of learning occurs. This method will compare the outputs from the network to the desired outputs and will update all the weights in a way that the error will be reduced.

The discussion about the derivation of the formulas used in this section is beyond the scope of this tutorial. Nonetheless I recommend to read some references to get the idea of the reasoning that lead to their derivation back in the 70s.

The indicator of the accuracy of the network is the error. If we want high accuracy, the error function has to be at a minimum. This error function is chosen because its derivative, which is what is really needed for backpropagation, is simply the sum of current outputs minus target outputs.

Because the error function is unknown —since the outputs depend on all the connections and their weights, and these are likely to be a lot of variables—, so is the condition for its minimum. Therefore we need a workaround to optimise the error function: gradient descent.

Gradient descent can be visualised in 3D (or 2D) as a ball rolling down a mountain. The ball rolls in the direction of the steepest slope until it reaches a valley (a minimum): The algorithm ‘simulates’ this ball by computing the local derivative (thus the slope) of the function and making small steps in the direction of the decrease, since it only knows the local slope but not the overall shape. After several iterations —usually thousands—, the algorithm eventually finds a set of weights for which the slope of the error function is increasing in all directions, that is, a minima.

How is this done? The error function has n+1 dimensions —where n is the number of connections (weights) and +1 is the dimension along which the error function has to be minimised. It’s easy to visualise it in 3D: In this case the error function only has 2 weights to optimise (x and y). The optimisation path that leads to the minimum is the result of iteratively adding small steps in the direction of maximum slope. Each weight has its own axis or dimension, and the slope in each axis is found by computing the partial derivative of the error function with respect to that axis.

So for each weight we need to compute the partial derivative and add a small step proportional to the partial derivative (slope) and in the sense of descent: The partial derivative is then converted to the following product (I encourage you to read the references to understand the derivations): From this point, the principle is quite similar to forward propagation, just that here we are propagating the so called error signals from the output layer to the input layer while also updating the weights of the connections.

The last point before being able to implement a simple version of backpropagation, are the formulas for the error signal:  So far, our ‘simulated ball’ follows the slope of the error function upon arriving to a minima, but what if it gets stuck at a local minima which is far from producing optimal results?

For this reason, some acceleration is usually introduced in the gradient descent so the ball is able to overcome small bumps. This is done by adding a fraction of the previous ΔW to the new ΔW: The momentum factor is a number between 0 and 1 and emulates friction. The bigger this number the bigger bumps the ball will be able to surpass, but the longer it will take for the ball to settle in a minimum. A value greater than one would add kinetic energy to the ball, thus preventing deceleration upon arrival to the minimum: Remark: remember the weight of the bias also needs to be updated!

Now it’s time for the implementation:

```def backPropagate(self, inputs, targetOutputs, previousDeltaWeightMap = []):
### Check is the number of outputs corresponds to the number of target outputs
if len(targetOutputs) != len(self.layers[self.OUTPUT_LAYER]):
sys.exit("Error: Number of target outputs does not correspond with number of output neurons.")
### Create temporary weight and delta weight maps
tempWeightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]
tempDeltaWeightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]

### Forward propagate
currentOutputs = self.forwardPropagate(inputs)

### Compute the error for each output neuron
currentErrors = self.computeOutputErrors(currentOutputs, targetOutputs)

### Iterate through layers backwards
for layerNum, layer in reversed(list(enumerate(self.layers))):
if layerNum == self.INPUT_LAYER:
break   # Finish backpropagation

else:
for i, currentNeuron in enumerate(layer):
outputDerivative = currentNeuron.output*(1-currentNeuron.output)
### Compute the current neuron error signal
if layerNum == self.OUTPUT_LAYER:
currentNeuron.errorSignal = outputDerivative*currentErrors[i].derivative
else:
forwardErrorSignalsWeightedSum = 0
for forwardNeuron in self.layers[layerNum+1]:
forwardErrorSignalsWeightedSum += forwardNeuron.errorSignal * self.weightMap[currentNeuron.index][forwardNeuron.index]
currentNeuron.errorSignal = outputDerivative * forwardErrorSignalsWeightedSum

### Compute the weight delta and update the weight
for backNeuron in self.layers[layerNum-1]:
currentWeight = self.weightMap[backNeuron.index][currentNeuron.index]
deltaW = -self.learningRate * backNeuron.output * currentNeuron.errorSignal
if len(previousDeltaWeightMap) != 0:
### If available, add the previous delta weight multiplied by the momentum factor
deltaW += previousDeltaWeightMap[backNeuron.index][currentNeuron.index] * self.momentum
tempWeightMap[backNeuron.index][currentNeuron.index] = currentWeight + deltaW
tempDeltaWeightMap[backNeuron.index][currentNeuron.index] = deltaW

### Handle the bias neuron
currentWeight = self.weightMap[currentNeuron.index]
deltaW = -self.learningRate * currentNeuron.errorSignal
if len(previousDeltaWeightMap) != 0:
### If available, add the previous delta weight multiplied by the momentum factor
deltaW += previousDeltaWeightMap[currentNeuron.index] * self.momentum
tempWeightMap[currentNeuron.index] = currentWeight + deltaW
tempDeltaWeightMap[currentNeuron.index] = deltaW

### Update the weight map with the new weights
self.weightMap = tempWeightMap

### Return the tempDeltaWeightMap (It is given as input in following backpropagations to add momentum)
return tempDeltaWeightMap
```

Step 5: Training

We are getting close to being able to use the neural network, and one of the last steps is to feed the network with training data and make it learn to produce the outputs we expect from it.

The training algorithm is quite simple: its job is to perform backpropagation with all training inputs multiple times. This is done in iterations and epochs:

• Iterations are the number of times each training input is sent to backpropagation within an epoch.
• Epochs are the number of times the complete training set is ran through iterations.

Because training can take long, specially for large networks and large training data sets, it’s useful to have feedback from the network. For this reason, I implemented a stopwatch that gives an estimation of the remaining time plus the elapsed time and remaining epochs. Expand the second block of code for the stopwatch.

```def train(self, inputs, targetOutputs, iterations, epochs, verbose = False):
### Start a stopwatch (only displayed if not in verbose mode)
stopWatch = StopWatch(totalNumberOfChecks = epochs*len(inputs))

for epoch in range(epochs):
for i in range(len(inputs)):
### Update the stopWatch
if not verbose and i &amp;amp;amp;gt; 0:
stopWatch.update(epoch) # For efficiency reasons, the stopwatch is updated per epoch (not per iteration)

previousDeltaWeightMap = []
for iteration in range(iterations):
if iteration == 0:
previousDeltaWeightMap  = self.backPropagate(inputs[i], targetOutputs[i])
else:
# If this is not the first iteration, use the previous weights to account for momentum
previousDeltaWeightMap = self.backPropagate(inputs[i], targetOutputs[i], previousDeltaWeightMap)
if verbose:
print("Epoch: " + str(epoch+1) + "  input: " + str(i+1) + "  Iteration: " + str(iteration+1))
if not verbose:
stopWatch.stop()
```
```class StopWatch:
def __init__(self, totalNumberOfChecks):
self.totalChecks = totalNumberOfChecks
self.checkCount = 0
self.lastCheckTime = 0
self.remainingSeconds = 0
self.firstCheckTime = time.time()
print("")

def convertToHoursMinutesSeconds(self, seconds):
minutes = 0
hours = 0
if seconds &amp;amp;amp;gt;= 60:
minutes = int(seconds/60)
seconds = seconds - minutes*60
if minutes &amp;amp;amp;gt;= 60:
hours = int(minutes/60)
minutes = minutes - hours*60
return (str(hours).zfill(2) + ":" + str(minutes).zfill(2) +
":" + str(seconds).zfill(2))

def update(self, epoch):
self.checkCount += 1
currentTime = time.time()
elapsedTime = currentTime - self.firstCheckTime
if currentTime - self.lastCheckTime &amp;amp;amp;gt;= 1.0:
self.lastCheckTime = currentTime
self.remainingSeconds = int(((self.totalChecks-self.checkCount)*elapsedTime)/self.checkCount)

print(("\rTraining... | Time remaining: " + self.convertToHoursMinutesSeconds(self.remainingSeconds) +
"   Time elapsed: " + self.convertToHoursMinutesSeconds(int(elapsedTime)) +
"     Epoch: " + str(epoch) + "/" + str(epochs)).ljust(95), end='')

def stop(self):
elapsedTime = time.time() - self.firstCheckTime
print(("\rTraining done. Time elapsed: " + self.convertToHoursMinutesSeconds(int(elapsedTime)                  )).ljust(95), end='')
```

Step 6: Test the network

Finally, time for the test! We will be ‘teaching’ the network to encode seven-segment-display digits into their binary counterparts: As we can observe from the training data, we will need 7 input neurons and 4 output neurons. The number of hidden layers and neurons per hidden layer is up to you, but adding more neurons and layers doesn’t necessarily mean more precision —in fact, you will loose precision due to overfitting (more on this at the end of the article). A [7, 5, 4] layout seems to generate good results.

Before training the network, we need to add a method to our NeuralNetwork class to visualise the performance of the network, in order to assess whether or not it’s producing the correct outputs. There are many indicators of the evolution of the network, but probably the most useful one is the total accuracy. In my implementation, the overall network accuracy is displayed as well as the output per each input along with the target output and the corresponding mean error.

```def binaryThreshold(items):
results = []
for item in items:
if item &amp;gt;= 0.5:
results.append(1.0)
else:
results.append(0.0)
return results

def plotResults(self, inputs, targetOutputs, minErrorForValidation, applyBynaryThreshold = False, exhaustive = True):
correctOutputs = 0
print("\n--------------------------------------------------------------------------------")
for i in range(len(inputs)):
setIsCorrect = False
currentOutputs = self.forwardPropagate(inputs[i])
if applyBynaryThreshold:
currentOutputs = binaryThreshold(currentOutputs)
currentGlobalError = self.globalNetError(currentOutputs, targetOutputs[i])
if abs(currentGlobalError) &amp;lt;= minErrorForValidation:
correctOutputs += 1
setIsCorrect = True

if exhaustive:
print("")
print("Results for input set " + str(i+1) + ": " + ("(CORRECT)" if setIsCorrect else "(NOT CORRECT)"))
print(" — Target outputs: " + str(targetOutputs[i]))
print(" — Current outputs: " + str(currentOutputs))
print(" — Mean error: " + str(currentGlobalError))
if exhaustive:
print("\n")
print("Minimum absolute error for validation: " + str(minErrorForValidation))
print("--------------------------------------------------------------------------------")

accuracy = (correctOutputs/len(targetOutputs))*100
print("")
print(" ===========================")
print(" | Total network accuracy: |")
print(" | " + str(round(accuracy, 3)).ljust(6, '0') + " % |")
print(" ===========================")
print("")
```

Now the only remaining step is to create the network and feed it with the training samples:

```npl = [7, 5, 4]
n = NeuralNetwork(npl)

tInputs = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0],
[0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0],
[1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0],
[1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0],
[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0]]

tOutputs = [[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0],
[0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[1.0, 0.0, 0.0, 0.0],
[1.0, 0.0, 0.0, 1.0]]

# The following parameters affect the accuracy as
n.learningRate = 0.7
n.momentum = 0.6
iterations = 20
epochs = 50

print("Neural network layout: " + str(npl) + "  (" + str(sum(npl)) + " neurons in total)")
print("\nTraining settings:")
print("   — Learning rate: " + str(n.learningRate))
print("   — Momentum     : " + str(n.momentum))
print("   — Iterations   : " + str(iterations))
print("   — Epochs       : " + str(epochs))
print("   — Total inputs : " + str(len(tInputs)))

n.train(tInputs, tOutputs, iterations, epochs, verbose = False)
n.plotResults(tInputs, tOutputs, minErrorForValidation = 0.1, applyBynaryThreshold = False, exhaustive = True)
```

During the training you should see some info about the network settings an the stopwatch: Once finished, some data about accuracy is printed on the prompt: Conclusions and remarks

As you may have thought, we don’t really need a neural network for translating from seven-segment indexes and binary equivalents. This could have been programmed in way less lines of code. But the nice thing about neural networks is that whatever the inputs and outputs you want to map, these roughly 300 lines of code remain the same. For instance, you can train a network to recognise handwritten digits or spoken words, or other data mapping problems for which you have enough training data, which is something that would otherwise take thousands of lines of code without even outperforming the neural network!

A successful neural network also requires some consideration on its layout. As presented at the beginning of the article, the amount of neurons is directly related to the complexity of the function modelled by the network. But an excessive number of neurons can lower the accuracy, and this is due to overfitting. Remember the objective is to identify the input data for its qualities and classify it in delimitated regions. Nonetheless, because the network is trained to identify these unknown regions, we want the amount of neurons to be just enough to model the boundaries: There are no exact rules on how to chose a layout, most of it is just trial and error. Even though, you can still narrow down your tests by considering the complexity of the data mapping you want to accomplish and knowing that adding an extra layer exponentially increases the complexity of the network.

For example, in our seven-segment to digital test, the test data was simple to map as we had just 9 different cases, the target outputs where either 0 or 1 and the inputs and outputs where related by simple logic. The right amount of hidden layer in this case is 1. Test results have shown that without any hidden layer the network accuracy was below 50%, whilst the addition of 2 or more hidden layers rarely accomplished 100% accuracy.

To compare with a more complex problem, Michael Nielsen in his online book about neural networks  uses also a single hidden layer to successfully recognise hand-written digits from scanned pictures.

In fact, when a neural network has more than one hidden layer is called a deep neural network. These kind of networks are very hard to train and require more advanced algorithms than backpropagation. A work around is to train smaller neural networks to compute subtasks and then group these smaller networks into a bigger one just as if they where neurons.

You reached the end of this guide!!! I hope it has been interesting or even helpful. I’ll be very happy to receive feedback for any mistake, unresolved question or feeling you might have about it. Just take in mind that my knowledge about the topic is roughly the provided by the references listed below.

The full code:

```####################################################################################
#
#   This is a feedforward neural network implementation. The training
#   algorithm is backpropagation and it is optimized implementing momentum.
#
#   Written by Alejandro Daniel Noel on July 2015 in Python 3.x
#   Contact info: futuretechmaker@gmail.com
#                 www.futuretechmaker.com
#
#   Useful resources:
#       http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf
#       http://www.cse.unsw.edu.au/~cs9417ml/MLP2/
#       http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
#
#   This application is free software and is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
####################################################################################

import numpy as np
import random
import time
import sys

def binaryThreshold(items):
results = []
for item in items:
if item >= 0.5:
results.append(1.0)
else:
results.append(0.0)
return results

class StopWatch:
def __init__(self, totalNumberOfChecks):
self.totalChecks = totalNumberOfChecks
self.checkCount = 0
self.lastCheckTime = 0
self.remainingSeconds = 0
self.firstCheckTime = time.time()
print("")

def convertToHoursMinutesSeconds(self, seconds):
minutes = 0
hours = 0
if seconds >= 60:
minutes = int(seconds/60)
seconds = seconds - minutes*60
if minutes >= 60:
hours = int(minutes/60)
minutes = minutes - hours*60
return (str(hours).zfill(2) + ":" + str(minutes).zfill(2) +
":" + str(seconds).zfill(2))

def update(self, epoch):
self.checkCount += 1
currentTime = time.time()
elapsedTime = currentTime - self.firstCheckTime
if currentTime - self.lastCheckTime >= 1.0:
self.lastCheckTime = currentTime
self.remainingSeconds = int(((self.totalChecks-self.checkCount)*elapsedTime)/self.checkCount)

print(("\rTraining... | Time remaining: " + self.convertToHoursMinutesSeconds(self.remainingSeconds) +
"   Time elapsed: " + self.convertToHoursMinutesSeconds(int(elapsedTime)) +
"     Epoch: " + str(epoch) + "/" + str(epochs)).ljust(95), end='')

def stop(self):
elapsedTime = time.time() - self.firstCheckTime
print(("\rTraining done. Time elapsed: " + self.convertToHoursMinutesSeconds(int(elapsedTime)                  )).ljust(95), end='')

class OutputNeuronError:
def __init__(self, primitive, derivative):
self.primitive  = primitive
self.derivative = derivative

class Neuron:
def __init__(self, index):
self.index = index
self.output = 0.0
self.errorSignal = 0.0

class NeuralNetwork:

def __init__(self, neurons_per_layer):  # neurons_per_layer is a list containing the number of  neurons per layer.
self.outputErrors = []
self.learningRate = 0.5     # Default value. Can be changed, but this seems to give good results.
self.momentum = 0.8         # Default value. Can be changed, but this seems to give good results.
self.neuronCount = sum(neurons_per_layer)
self.INPUT_LAYER = 0                            # Input layer is always at index 0
self.OUTPUT_LAYER = len(neurons_per_layer) - 1  # Output layer is the last index
self.layers = []    # This is a list of layers. Each layer is a list of neurons.
# The weights are stored in a Hinton diagram (a square matrix where each instance is the weight between the row and column index neurons).
self.weightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]

### Fill the layer list
currentIndex = 1
for neuronsInCurrentLayer in neurons_per_layer:
newLayer = []
for neuron in range(neuronsInCurrentLayer):
newNeuron = Neuron(currentIndex)    # Create a neuron with a unique (absolute) index.
currentIndex += 1
newLayer.append(newNeuron)          # Add the neuron to the new layer
self.layers.append(newLayer)            # Add the new layer to the layer list

### Set random weights between neurons
for i in range(0, len(self.layers)-1):
for neuron1 in self.layers[i]:
for neuron2 in self.layers[i+1]:
### Weights between neurons
self.weightMap[neuron1.index][neuron2.index] = random.random()
### Weights between the bias and each neuron (excluding the input neurons)
self.weightMap[neuron2.index] = random.random()

def plotWeightMap(self, excludeBias = False):   # Prints the weight map (Hinton diagram). Not useful for big networks as the plot won't fit in the command prompt.
print("Hinton diagram:\n")
output = "".ljust(3)
for i in range(0 if excludeBias == False else 1, self.neuronCount+1):
output += ("b" if i == 0 else str(i)).ljust(7)
print(output)
for i, row in enumerate(self.weightMap):
if i == 0 and excludeBias: continue
output = ("b" if i == 0 else str(i)).ljust(3)

for j, weight in enumerate(row):
if j == 0 and excludeBias: continue
output += str(round(weight, 4)).ljust(7)
'''
if weight == 0:
output += "0  "
else:
output += "1  "
'''
print(output)

def forwardPropagate(self, inputs):
### Check if the number of inputs corresponds to the number of input neurons
if len(inputs) != len(self.layers[self.INPUT_LAYER]):
sys.exit("Error: Number of inputs does not correspond with number of input neurons.")

# Set input neurons' outputs as the inputs
for i, neuron in enumerate(self.layers[self.INPUT_LAYER]):
neuron.output = inputs[i]

for layerNum in range(1, len(self.layers)):
for i, currentNeuron in enumerate(self.layers[layerNum]):
### Compute weighted sum of the inputs to the neuron
weightedInputSum = 0
for i, previousNeuron in enumerate(self.layers[layerNum-1]):
weightedInputSum += previousNeuron.output * self.weightMap[previousNeuron.index][currentNeuron.index]
weightedInputSum += self.weightMap[currentNeuron.index]
### Compute the output using the logistic function
neuronOutput = 1/(1+np.exp(-weightedInputSum))
currentNeuron.output = neuronOutput # Store the current output in the neuron

### Return a list with the outputs from the output layer
return [neuron.output for neuron in self.layers[self.OUTPUT_LAYER]]

def computeOutputErrors(self, outputs, targetOutputs):
### Compute the error vector, containing the primitive and the derivative of the output error for each output neuron.
errorVector = []
for i in range(len(outputs)):
currentError = OutputNeuronError(primitive = 0.5*(outputs[i] - targetOutputs[i])**2,
derivative = outputs[i] - targetOutputs[i])
errorVector.append(currentError)
return errorVector

def globalNetError(self, currentOutputs, targetOutputs):    # Computes the mean error of the network
allErrors = self.computeOutputErrors(currentOutputs, targetOutputs)
### Sum all the errors
globalError = 0
for outputError in allErrors:
globalError += -outputError.derivative # derivative => difference between current and target output, but we want target - current, that's the reason for the negative sign 😉
return globalError/len(allErrors)

def backPropagate(self, inputs, targetOutputs, previousDeltaWeightMap = []):
### Check is the number of outputs corresponds to the number of target outputs
if len(targetOutputs) != len(self.layers[self.OUTPUT_LAYER]):
sys.exit("Error: Number of target outputs does not correspond with number of output neurons.")
### Create temporary weight and delta weight maps
tempWeightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]
tempDeltaWeightMap = [[0 for x in range(self.neuronCount+1)] for x in range(self.neuronCount+1)]

### Forward propagate
currentOutputs = self.forwardPropagate(inputs)

### Compute the error for each output neuron
currentErrors = self.computeOutputErrors(currentOutputs, targetOutputs)

### Iterate through layers backwards
for layerNum, layer in reversed(list(enumerate(self.layers))):
if layerNum == self.INPUT_LAYER:
break   # Finish backpropagation

else:
for i, currentNeuron in enumerate(layer):
outputDerivative = currentNeuron.output*(1-currentNeuron.output)
### Compute the current neuron error signal
if layerNum == self.OUTPUT_LAYER:
currentNeuron.errorSignal = outputDerivative*currentErrors[i].derivative
else:
forwardErrorSignalsWeightedSum = 0
for forwardNeuron in self.layers[layerNum+1]:
forwardErrorSignalsWeightedSum += forwardNeuron.errorSignal * self.weightMap[currentNeuron.index][forwardNeuron.index]
currentNeuron.errorSignal = outputDerivative * forwardErrorSignalsWeightedSum

### Compute the weight delta and update the weight
for backNeuron in self.layers[layerNum-1]:
currentWeight = self.weightMap[backNeuron.index][currentNeuron.index]
deltaW = -self.learningRate * backNeuron.output * currentNeuron.errorSignal
if len(previousDeltaWeightMap) != 0:
### If available, add the previous delta weight multiplied by the momentum factor
deltaW += previousDeltaWeightMap[backNeuron.index][currentNeuron.index] * self.momentum
tempWeightMap[backNeuron.index][currentNeuron.index] = currentWeight + deltaW
tempDeltaWeightMap[backNeuron.index][currentNeuron.index] = deltaW

### Handle the bias neuron
currentWeight = self.weightMap[currentNeuron.index]
deltaW = -self.learningRate * currentNeuron.errorSignal
if len(previousDeltaWeightMap) != 0:
### If available, add the previous delta weight multiplied by the momentum factor
deltaW += previousDeltaWeightMap[currentNeuron.index] * self.momentum
tempWeightMap[currentNeuron.index] = currentWeight + deltaW
tempDeltaWeightMap[currentNeuron.index] = deltaW

### Update the weight map with the new weights
self.weightMap = tempWeightMap

### Return the tempDeltaWeightMap (It is given as input in following backpropagations to enhance learning)
return tempDeltaWeightMap

def train(self, inputs, targetOutputs, iterations, epochs, verbose = False):
### Start a stopwatch (only displayed if not in verbose mode)
stopWatch = StopWatch(totalNumberOfChecks = epochs*len(inputs))

for epoch in range(epochs):
for i in range(len(inputs)):
### Update the stopWatch
if not verbose and i > 0:
stopWatch.update(epoch) # For efficiency reasons, the stopwatch is updated per epoch (not per iteration)

previousDeltaWeightMap = []
for iteration in range(iterations):
if iteration == 0:
previousDeltaWeightMap  = self.backPropagate(inputs[i], targetOutputs[i])
else:
# If this is not the first iteration, use the previous weights to account for momentum
previousDeltaWeightMap = self.backPropagate(inputs[i], targetOutputs[i], previousDeltaWeightMap)
if verbose:
print("Epoch: " + str(epoch+1) + "  input: " + str(i+1) + "  Iteration: " + str(iteration+1))
if not verbose:
stopWatch.stop()

def plotResults(self, inputs, targetOutputs, minErrorForValidation, applyBynaryThreshold = False, exhaustive = False):
correctOutputs = 0
print("\n--------------------------------------------------------------------------------")
for i in range(len(inputs)):
setIsCorrect = False
currentOutputs = self.forwardPropagate(inputs[i])
if applyBynaryThreshold:
currentOutputs = binaryThreshold(currentOutputs)
currentGlobalError = self.globalNetError(currentOutputs, targetOutputs[i])
if abs(currentGlobalError) <= minErrorForValidation:
correctOutputs += 1
setIsCorrect = True

if exhaustive:
print("")
print("Results for input set " + str(i+1) + ": " + ("(CORRECT)" if setIsCorrect else "(NOT CORRECT)"))
print(" — Target outputs:   " + str(targetOutputs[i]))
print(" — Current outputs:  " + str(currentOutputs))
print(" — Mean error:       " + str(currentGlobalError))
if exhaustive:
print("\n")
print("Minimum absolute error for validation: " + str(minErrorForValidation))
print("--------------------------------------------------------------------------------")

accuracy = (correctOutputs/len(targetOutputs))*100
print("")
print("      ===========================")
print("     | Total network accuracy:   |")
print("     |        " + str(round(accuracy, 3)).ljust(6, '0') + " %           |")
print("      ===========================")
print("")

# Instantiate a neural network
npl = [7, 5, 4]     # 7 input neurons, 4 hidden neurons in one hidden layer and 4 output neurons
n = NeuralNetwork(npl)
#print("")
#n.plotWeightMap(excludeBias = False)

# Set inputs and target outputs
tInputs = [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0],
[0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0],
[1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0],
[1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0],
[1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0],
[0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0],
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
[1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0]]

tOutputs = [[0.0, 0.0, 0.0, 0.0],
[0.0, 0.0, 0.0, 1.0],
[0.0, 0.0, 1.0, 0.0],
[0.0, 0.0, 1.0, 1.0],
[0.0, 1.0, 0.0, 0.0],
[0.0, 1.0, 0.0, 1.0],
[0.0, 1.0, 1.0, 0.0],
[0.0, 1.0, 1.0, 1.0],
[1.0, 0.0, 0.0, 0.0],
[1.0, 0.0, 0.0, 1.0]]

# Set training parameters:
n.learningRate = 0.7
n.momentum = 0.6
iterations = 20
epochs = 50

# Print info and train the network
print("Neural network layout: " + str(npl) + "  (" + str(sum(npl)) + " neurons in total)")
print("\nTraining settings:")
print("   — Learning rate: " + str(n.learningRate))
print("   — Momentum     : " + str(n.momentum))
print("   — Iterations   : " + str(iterations))
print("   — Epochs       : " + str(epochs))
print("   — Total inputs : " + str(len(tInputs)))
n.train(tInputs, tOutputs, iterations, epochs, verbose = False)
n.plotResults(tInputs, tOutputs, minErrorForValidation = 0.1, applyBynaryThreshold = False, exhaustive = True)
```

References

 Online book about neural networks: http://neuralnetworksanddeeplearning.com

 Overfitting and underfitting: http://www.statsblogs.com/2014/03/20/machine-learning-lesson-of-the-day-overfitting-and-underfitting/

 Collection of articles about NN theory:  http://www.cse.unsw.edu.au/~cs9417ml/MLP2/

 Detailed explanation of the backpropagation algorithm: http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

 More on backpropagation: http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf