Theano is quite useful in training neural networks where we have to repeatedly calculate cost, and gradients to achieve an optimum. On large datasets, this becomes computationally intensive. Theano does this efficiently due to its internal optimizations of the computational graph that we have seen earlier.
We shall now learn how to use Theano library to train a network. We will take a simple case where we start with a four feature dataset. We compute the sum of these features after applying a certain weight (importance) to each feature.
The goal of the training is to modify the weights assigned to each feature so that the sum reaches a target value of 100.
sum = f1 * w1 + f2 * w2 + f3 * w3 + f4 * w4
Where f1, f2, ... are the feature values and w1, w2, ... are the weights.
Let me quantize the example for a better understanding of the problem statement. We will assume an initial value of 1.0 for each feature and we will take w1 equals 0.1, w2 equals 0.25, w3 equals 0.15, and w4 equals 0.3. There is no definite logic in assigning the weight values, it is just our intuition. Thus, the initial sum is as follows −
sum = 1.0 * 0.1 + 1.0 * 0.25 + 1.0 * 0.15 + 1.0 * 0.3
Which sums to 0.8. Now, we will keep modifying the weight assignment so that this sum approaches 100. The current resultant value of 0.8 is far away from our desired target value of 100. In Machine Learning terms, we define cost as the difference between the target value minus the current output value, typically squared to blow up the error. We reduce this cost in each iteration by calculating the gradients and updating our weights vector.
Let us see how this entire logic is implemented in Theano.
We first declare our input vector x as follows −
x = tensor.fvector('x')
Where x is a single dimensional array of float values.
We define a scalar target variable as given below −
target = tensor.fscalar('target')
Next, we create a weights tensor W with the initial values as discussed above −
W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W')
We now calculate the output using the following expression −
y = (x * W).sum()
Note that in the above statement x and W are the vectors and not simple scalar variables. We now calculate the error (cost) with the following expression −
cost = tensor.sqr(target - y)
The cost is the difference between the target value and the current output, squared.
To calculate the gradient which tells us how far we are from the target, we use the built-in grad method as follows −
gradients = tensor.grad(cost, [W])
We now update the weights vector by taking a learning rate of 0.1 as follows −
W_updated = W - (0.1 * gradients[0])
Next, we need to update our weights vector using the above values. We do this in the following statement −
updates = [(W, W_updated)]
Lastly, we define a function in Theano to compute the sum.
f = function([x, target], y, updates=updates)
To invoke the above function a certain number of times, we create a for loop as follows −
for i in range(10): output = f([1.0, 1.0, 1.0, 1.0], 100.0)
As said earlier, the input to the function is a vector containing the initial values for the four features - we assign the value of 1.0 to each feature without any specific reason. You may assign different values of your choice and check if the function ultimately converges. We will print the values of the weight vector and the corresponding output in each iteration. It is shown in the below code −
print ("iteration: ", i) print ("Modified Weights: ", W.get_value()) print ("Output: ", output)
The complete program listing is reproduced here for your quick reference −
from theano import * import numpy x = tensor.fvector('x') target = tensor.fscalar('target') W = theano.shared(numpy.asarray([0.1, 0.25, 0.15, 0.3]), 'W') print ("Weights: ", W.get_value()) y = (x * W).sum() cost = tensor.sqr(target - y) gradients = tensor.grad(cost, [W]) W_updated = W - (0.1 * gradients[0]) updates = [(W, W_updated)] f = function([x, target], y, updates=updates) for i in range(10): output = f([1.0, 1.0, 1.0, 1.0], 100.0) print ("iteration: ", i) print ("Modified Weights: ", W.get_value()) print ("Output: ", output)
When you run the program you will see the following output −
Weights: [0.1 0.25 0.15 0.3 ] iteration: 0 Modified Weights: [19.94 20.09 19.99 20.14] Output: 0.8 iteration: 1 Modified Weights: [23.908 24.058 23.958 24.108] Output: 80.16000000000001 iteration: 2 Modified Weights: [24.7016 24.8516 24.7516 24.9016] Output: 96.03200000000001 iteration: 3 Modified Weights: [24.86032 25.01032 24.91032 25.06032] Output: 99.2064 iteration: 4 Modified Weights: [24.892064 25.042064 24.942064 25.092064] Output: 99.84128 iteration: 5 Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128] Output: 99.968256 iteration: 6 Modified Weights: [24.89968256 25.04968256 24.94968256 25.09968256] Output: 99.9936512 iteration: 7 Modified Weights: [24.89993651 25.04993651 24.94993651 25.09993651] Output: 99.99873024 iteration: 8 Modified Weights: [24.8999873 25.0499873 24.9499873 25.0999873] Output: 99.99974604799999 iteration: 9 Modified Weights: [24.89999746 25.04999746 24.94999746 25.09999746] Output: 99.99994920960002
Observe that after four iterations, the output is 99.96 and after five iterations, it is 99.99, which is close to our desired target of 100.0.
Depending on the desired accuracy, you may safely conclude that the network is trained in 4 to 5 iterations. After the training completes, look up the weights vector, which after 5 iterations takes the following values −
iteration: 5 Modified Weights: [24.8984128 25.0484128 24.9484128 25.0984128]
You may now use these values in your network for deploying the model.