This chapter deals with concepts of Neural Network with regards to CNTK.
As we know that, several layers of neurons are used for making a neural network. But, the question arises that in CNTK how we can model the layers of a NN? It can be done with the help of layer functions defined in the layer module.
Actually, in CNTK, working with the layers has a distinct functional programming feel to it. Layer function looks like a regular function and it produces a mathematical function with a set of predefined parameters. Let’s see how we can create the most basic layer type, Dense, with the help of layer function.
With the help of following basic steps, we can create the most basic layer type −
Step 1 − First, we need to import the Dense layer function from the layers’ package of CNTK.
from cntk.layers import Dense
Step 2 − Next from the CNTK root package, we need to import the input_variable function.
from cntk import input_variable
Step 3 − Now, we need to create a new input variable using the input_variable function. We also need to provide the its size.
feature = input_variable(100)
Step 4 − At last, we will create a new layer using Dense function along with providing the number of neurons we want.
layer = Dense(40)(feature)
Now, we can invoke the configured Dense layer function to connect the Dense layer to the input.
from cntk.layers import Dense from cntk import input_variable feature= input_variable(100) layer = Dense(40)(feature)
As we have seen CNTK provides us with a pretty good set of defaults for building NNs. Based on activation function and other settings we choose, the behavior as well as performance of the NN is different. It is another very useful stemming algorithm. That’s the reason, it is good to understand what we can configure.
Each layer in NN has its unique configuration options and when we talk about Dense layer, we have following important settings to define −
shape − As name implies, it defines the output shape of the layer which further determines the number of neurons in that layer.
activation − It defines the activation function of that layer, so it can transform the input data.
init − It defines the initialisation function of that layer. It will initialise the parameters of the layer when we start training the NN.
Let’s see the steps with the help of which we can configure a Dense layer −
Step1 − First, we need to import the Dense layer function from the layers’ package of CNTK.
from cntk.layers import Dense
Step2 − Next from the CNTK ops package, we need to import the sigmoid operator. It will be used to configure as an activation function.
from cntk.ops import sigmoid
Step3 − Now, from initializer package, we need to import the glorot_uniform initializer.
from cntk.initializer import glorot_uniform
Step4 − At last, we will create a new layer using Dense function along with providing the number of neurons as the first argument. Also, provide the sigmoid operator as activation function and the glorot_uniform as the init function for the layer.
layer = Dense(50, activation = sigmoid, init = glorot_uniform)
from cntk.layers import Dense from cntk.ops import sigmoid from cntk.initializer import glorot_uniform layer = Dense(50, activation = sigmoid, init = glorot_uniform)
Till now, we have seen how to create the structure of a NN and how to configure various settings. Here, we will see, how we can optimise the parameters of a NN. With the help of the combination of two components namely learners and trainers, we can optimise the parameters of a NN.
The first component which is used to optimise the parameters of a NN is trainer component. It basically implements the backpropagation process. If we talk about its working, it passes the data through the NN to obtain a prediction.
After that, it uses another component called learner in order to obtain the new values for the parameters in a NN. Once it obtains the new values, it applies these new values and repeat the process until an exit criterion is met.
The second component which is used to optimise the parameters of a NN is learner component, which is basically responsible for performing the gradient descent algorithm.
Following is the list of some of the interesting learners included in CNTK library −
Stochastic Gradient Descent (SGD) − This learner represents the basic stochastic gradient descent, without any extras.
Momentum Stochastic Gradient Descent (MomentumSGD) − With SGD, this learner applies the momentum to overcome the problem of local maxima.
RMSProp − This learner, in order to control the rate of descent, uses decaying learning rates.
Adam − This learner, in order to decrease the rate of descent over time, uses decaying momentum.
Adagrad − This learner, for frequently as well as infrequently occurring features, uses different learning rates.