Guide To Deep Learning
User Manual:
Open the PDF directly: View PDF .
Page Count: 111
Download | |
Open PDF In Browser | View PDF |
10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV Learn OpenCV Neural Networks : A 30,000 Feet View for Beginners MAY 2, 2017 BY SATYA MALLICK (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/SPMALLICK/) In this article, I am going to provide a 30,000 feet view of Neural Networks. The post is written for absolute beginners who are trying to dip their toes in Machine Learning and Deep Learning. We will keep this short, sweet and math-free. This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installingdeep-learning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflowpython/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pretrained-models/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . Neural Networks as Black Box We will start by treating a Neural Networks as a magical black box. You don’t know what’s inside the black box. All you know is that it has one input and three outputs. The input is an image of any size, color, kind etc. The three outputs are numbers between 0 and 1. The outputs are labeled “Cat”, “Dog”, https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 1/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV and “Other”. The three numbers always add up to 1. (/wp-content/uploads/2017/05/neural-network-as-blackbox.jpg) Understanding the Neural Network Output The magic it performs is very simple. If you input an image to the black box, it will output three numbers. A perfect neural network would output (1, 0, 0) for a cat, (0, 1, 0) for a dog and (0, 0, 1) for anything that is not a cat or a dog. In reality, though, even a well trained neural network will not give such clean results. For example, if you input the image of a cat, the number under the label “Cat” could say 0.97, the number under “Dog” could say 0.01 and the number under the label “Other” could say 0.02. The outputs can be interpreted as probabilities. This specific output means that the black box “thinks” there is a 97% chance that the input image is that of a cat and a small chance that it is either a dog or something it does not recognize. Note that the output numbers add up to 1. This particular problem is called image classification; given an image, you can use the label with the highest probability to assign it a class ( Cat, Dog, Other ). Understanding the Neural Network Input Now, you are a programmer and you are thinking you could use floats and doubles to represent the output of the Neural Network. How do you input an image? Images are just an array of numbers. A 256×256 image with three channels is simply an array of https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 2/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV 256x256x3 = 196,608 numbers. Most libraries you use for reading the image will read a 256×256 color image into a continuous block of 196,608 numbers in memory. With this new knowledge, we know the input is slightly more complicated. It is actually 196,608 numbers. Let us update our black box to reflect this new reality. (/wp-content/uploads/2017/05/neural-network-as-blackbox-2.jpg) I know what you are thinking. What about images that are not 256×256. Well, you can always convert any image to size 256×256 using the following steps. 1. Non-Square aspect ratio: If the input image is not square, you can resize the image so that the smaller dimension is 256. Then, crop 256×256 pixels from the center of the image. 2. Grayscale image: If the input image is not a color image, you can create a 3 channel image by copying the grayscale image into three channels. People use many different tricks to convert an image to a fixed size ( e.g. a 256×256 ) image, but since I promised I will keep it simple, I won’t go into those tricks. The important thing to note is that any image can be converted into a fixed size image even though we lose some information when we crop and resize an image to that fixed size. What does it mean to train a Neural Network ? The black box has knobs that can be used to “tune” it. In technical jargon, these knobs are called https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 3/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV weights. When the knobs are in the right position, the neural network gives the right output more often for different inputs. Training the neural net simply means finding the right knob settings ( or weights ). (/wp-content/uploads/2017/05/neural-network-as-blackbox-3.jpg) How do you train a Neural Network? If you had this magical black box but did not know the right knob settings, it would be a useless box. The good news is that you can find the right knob settings by “training” the Neural Network. Training a Neural Network is very similar to training a little child. You show the child a ball and tell her that it is a “ball”. When you do that many times with different kinds of balls, the child figures out that it is the shape of the ball that makes it a ball and not the color, texture or size. You then show the child an egg and ask, “What is this?” She responds “Ball.” You correct them that it is not a ball, but an egg. When this process is repeated several times, the child is able to tell the difference between a ball and an egg. https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 4/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV To train a Neural Network, you show it several thousand examples of the classes ( e.g. Cat, Dog, Other ) you want it to learn. This kind of training is called Supervised Learning because you are providing the Neural Network an image of a class and explicitly telling it that it is an image from that class. To train a neural network, we, therefore, need three things. 1. Training data : Thousands of images of each class and the expected output. For example, for all images of cats in this dataset, the expected output is (1, 0, 0). 2. Cost function : We need to know if the current setting is better than the previous knob setting. A cost function sums up the errors made by the neural network over all images in the training set. For example, a common cost function is called sum of squared errors (SSE). If the expected output for an image is a cat, or (1, 0, 0) and the neural network outputs (0.37, 0.5, 0.13), the squared error made by the neural network on this particular image is . The total cost over all images is simply the sum of squared errors over all images. The goal of training is to find the knob settings that will minimize the cost function. 3. How to update the knob settings: Finally we need a way to update the knob settings based on the error we observe over all training images. Training a neural network with a single knob Let’s say we have a thousand images of cats, a thousand images of dogs, and a thousand images of random objects that are not cats or dogs. These three thousand images are our training set. If our neural network has not been trained, it will have some random knob settings and when you input these three thousand images, the output will be right only one in three times. For the purpose of simplicity, let’s say our neural network has just one knob. Since we have just one knob, we could test a thousand different knob settings spanning the range of expected knob values and find the best knob setting that minimizes the cost function. This would complete our training. However, the real world neural networks do not have a single knob. For example, VGG-Net, a popular neural network architecture has 138 million knobs! Training a neural network with multiple knobs When we had just one knob, we could easily find the best setting by testing all (or a very large number of) possibilities. This quickly becomes unrealistic because even if we had just three knobs, we would have to test a billion settings. Imagine the number of possibilities with something as large as VGG-Net. https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 5/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV Needless to say a brute force search for the optimal knob settings is not feasible. Fortunately, there is a way out. When the cost function is convex ( i.e. shaped like a bowl ), there is a principled way to iteratively find the best weight by a method called Gradient Descent Gradient Descent Let’s go back to our Neural Network with just one knob and assume that our current estimate of the knob setting ( or weight ) is . If our cost function is shaped like a bowl, we could find the slope of the cost function and move a step closer to the optimum knob setting . This procedure is called Gradient Descent because we are moving down (descending) the curve based on the slope (gradient). When you reach the bottom of the bowl, the gradient or slope goes to zero and that completes your training. These bowl-shaped functions are technically called convex functions. (/wp-content/uploads/2017/05/gradient-descent.png) How do you come up with the first estimate? You can pick a random number. Note: If you are using popular neural network architectures like GoogleNet or VGG-Net, you can use the weight trained on ImageNet instead of picking random initial weights to get much faster convergence. https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 6/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV Gradient Descent works similarly when there are multiple knobs. For example, when there are two knobs, the cost function is a bowl in 3D. If we place a ball on any part of this bowl, it will roll down to the bottom following the path of the maximum downward slope. This is exactly how gradient descent works. Also, note that if you let the ball roll down at full velocity, it will overshoot the bottom and take much more time to settle down at the bottom compared to a ball that is rolled down slowly in a more controlled manner. Similarly, while training a neural network, we use a parameter called the learning rate to control convergence of cost to its minimum. When we have millions of knobs (weights), the shape of the cost function is a bowl in this higher dimensional space. Even though such a bowl is impossible to visualize, the concept of slope and Gradient Descent works just as well. Therefore, Gradient Descent allows us to converge to a solution thus making the problem tractable. Backpropagation There is one piece left in the puzzle. Given our current knob settings, how do we know the slope of the cost function? First, let’s remember that the cost function, and therefore its gradient depends on the difference between true output and the current output for all images in the training set. In other words, every image in the training set contributes to the final gradient calculation based on how badly the Neural Network performs on those images. The algorithm used for estimating the gradient of the cost function is called Backpropagation. We will cover backpropagation in a future post and yes it does involve calculus. You would be surprised though that backpagation is simply repetitive application of the chain rule that you might have learned in high school. Subscribe & Download Code If you liked this article and would like to receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/). In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. You will also receive free access to all the code I have written for this blog. https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 7/8 10/27/2018 Neural Networks : A 30,000 Feet View for Beginners | Learn OpenCV Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/neural-networks-a-30000-feet-view-for-beginners/ 8/8 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV Learn OpenCV Deep learning using Keras – The Basics SEPTEMBER 25, 2017 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) (/wp-content/uploads/2017/09/import-keras.png) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installingdeep-learning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflowpython/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pretrained-models/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . 1. Deep Learning Frameworks https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 1/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV Deep Learning is a branch of AI which uses Neural Networks for Machine Learning. In the recent years, it has shown dramatic improvements over traditional machine learning methods with applications in Computer Vision, Natural Language Processing, Robotics among many others. A very light introduction to Convolutional Neural Networks ( a type of Neural Network ) is covered in this article (/neuralnetworks-a-30000-feet-view-for-beginners/). Deep Learning became a household name for AI engineers since 2012 when Alex Krizhevsky (https://scholar.google.com/citations?user=xegzhJcAAAAJ) and his team won the ImageNet challenge. ImageNet (http://image-net.org) is a computer vision competition in which the computer is required to correctly classify the image of an object into one of 1000 categories. The objects include different types of animals, plants, instruments, furniture, Vehicles to name a few. This attracted a lot of attention from the Computer vision community and almost everyone started working on Neural Networks. But at that time, there were not many tools available to get you started in this new domain. A lot of effort has been put in by the community of researchers to create useful libraries making it easy to work in this emerging field. Some popular deep learning frameworks at present are Tensorflow (https://www.tensorflow.org/), Theano (http://deeplearning.net/software/theano/), Caffe (http://caffe.berkeleyvision.org/), Pytorch (http://pytorch.org/), CNTK (https://www.microsoft.com/en-us/cognitive-toolkit/), MXNet (https://mxnet.incubator.apache.org/), Torch (http://torch.ch/), deeplearning4j (https://deeplearning4j.org/), Caffe2 (https://caffe2.ai/) among many others. Keras is a high-level API, written in Python and capable of running on top of TensorFlow, Theano, or CNTK. The above deep learning libraries are written in a general way with a lot of functionalities. This can be overwhelming for a beginner who has limited knowledge in deep learning. Keras provides a simple and modular API to create and train Neural Networks, hiding most of the complicated details under the hood. This makes it easy to get you started on your Deep Learning journey. Once you get familiar with the main concepts and want to dig deeper and take control of the process, you may choose to work with any of the above frameworks. 2. Keras installation and configuration As mentioned above, Keras is a high-level API that uses deep learning libraries like Theano or Tensorflow as the backend. These libraries, in turn, talk to the hardware via lower level libraries. For example, if you run the program on a CPU, Tensorflow or Theano use BLAS libraries. On the other https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 2/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV hand, when you run on a GPU, they use CUDA and cuDNN libraries. If you are setting up a new system, you might want to look at this article (/installing-deep-learningframeworks-on-ubuntu-with-cuda-support/) for installing the most common deep learning frameworks. We will mention only the Keras specific part here. It is advisable to install everything on virtual environments. If virtual environment is not installed on the system, then check step 5 of the above article. We will install Theano and Tensorflow as backend libraries for Keras, along with some more libraries which are useful for working with data ( h5py ) and visualization ( pydot, graphviz and matplotlib ). Create virtual environment Create the virtual environment for either python 2 or python 3, whichever you want to use. 1 2 3 mkvirtualenv virtual-py2 -p python2 # Activate the virtual environment workon virtual-py2 Or 1 2 3 mkvirtualenv virtual-py3 -p python3 # Activate the virtual environment workon virtual-py3 Install libraries 1 2 3 4 5 6 7 pip #If pip #If pip pip pip install Theano using only CPU install tensorflow using GPU install tensorflow-gpu install keras install h5py pydot matplotlib Also install graphviz 1 2 3 4 5 #For Ubuntu sudo apt-get install graphviz #For MacOs brew install graphviz Configure Keras By default, Keras is configured to use Tensorflow as the backend since it is the most popular choice. However, If you want to change it to Theano, open the file ~/.keras/keras.json which looks as shown: 1 { https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 3/10 10/27/2018 1 2 3 4 5 6 { } Deep learning using Keras - The Basics | Learn OpenCV "epsilon": 1e-07, "floatx": "float32", "image_data_format": "channels_last", "backend": "tensorflow" and change it to 1 2 3 4 5 6 { } "epsilon": 1e-07, "floatx": "float32", "image_data_format": "channels_first", "backend": "theano" 3. Keras Workflow Keras provides a very simple workflow for training and evaluating the models. It is described with the following diagram (/wp-content/uploads/2017/09/keras-workflow.jpg) Basically, we are creating the model and training it using the training data. Once the model is trained, we take the model to perform inference on test data. Let us understand the function of each of the blocks. https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 4/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV 3.1. Keras Layers Layers can be thought of as the building blocks of a Neural Network. They process the input data and produce different outputs, depending on the type of layer, which are then used by the layers which are connected to them. We will cover the details of every layer in future posts. Keras provides a number of core layers which include Dense layers, also called fully connected layer, since, each node in the input is connected to every node in the output, Activation layer which includes activation functions like ReLU, tanh, sigmoid among others, Dropout layer – used for regularization during training, Flatten, Reshape, etc. Apart from these core layers, some important layers are Convolution layers – used for performing convolution, Pooling layers – used for down sampling, Recurrent layers, Locally-connected, normalization, etc. We can use the code snippet to import the respective layer 1 from keras.layers import Dense, Activation, Conv2D, MaxPooling2D 3.2. Keras Models Keras provides two ways to define a model: Sequential (https://keras.io/getting-started/sequential-model-guide/), used for stacking up layers – Most commonly used. Functional API (https://keras.io/getting-started/functional-api-guide/), used for designing complex model architectures like models with multiple-outputs, shared layers etc. 1 from keras.models import Sequential For creating a Sequential model, we can either pass the list of layers as an argument to the constructor or add the layers sequentially using the model.add() function. For example, both the code snippets for creating a model with a single dense layer with 10 outputs are https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 5/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV equivalent. 1 2 3 4 5 from keras.models import Sequential from keras.layers import Dense, Activation 1 2 3 4 5 6 from keras.models import Sequential from keras.layers import Dense, Activation model = Sequential([Dense(10, input_shape=(nFeatures,)), Activation('linear') ]) model = Sequential() model.add(Dense(10, input_shape=(nFeatures,))) model.add(Activation('linear')) An important thing to note in the model definition is that we need to specify the input shape for the first layer. This is done in the above snippet using the input_shape parameter passed along with the first Dense layer. The shapes of other layers are inferred by the compiler. 3.3. Configuring the training process Once the model is ready, we need to configure the learning process. This means Specify an Optimizer which determines how the network weights are updated Specify the type of cost function or loss function. Specify the metrics you want to evaluate during training and testing. Create the model graph using the backend. Any other advanced configuration. This is done in Keras using the model.compile() function. The code snippet shows the usage. 1 model.compile(optimizer='rmsprop', loss='mse', metrics=['mse', 'mae']) The mandatory parameters to be specified are the optimizer and the loss function. Optimizers Keras provides a lot of optimizers to choose from, which include Stochastic Gradient Descent ( SGD ), Adam, RMSprop, AdaGrad, AdaDelta, etc. https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 6/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV RMSprop is a good choice of optimizer for most problems. Loss functions In a supervised learning problem, we have to find the error between the actual values and the predicted value. There can be different metrics which can be used to evaluate this error. This metric is often called loss function or cost function or objective function. There can be more than one loss function depending on what you are doing with the error. In general, we use binary-cross-entropy for a binary classification problem, categorical-cross-entropy for a multi-class classification problem, mean-squared-error for a regression problem and so on. 3.4. Training Once the model is configured, we can start the training process. This can be done using the model.fit() function in Keras. The usage is described below. 1 model.fit(trainFeatures, trainLabels, batch_size=4, epochs = 100) We just need to specify the training data, batch size and number of epochs. Keras automatically figures out how to pass the data iteratively to the optimizer for the number of epochs specified. The rest of the information was already given to the optimizer in the previous step. 3.5. Evaluating the model Once the model is trained, we need to check the accuracy on unseen test data. This can be done in two ways in Keras. model.evaluate() – It finds the loss and metrics specified in the model.compile() step. It takes both the test data and labels as input and gives a quantitative measure of the accuracy. It can also be used to perform cross-validation and further finetune the parameters to get the best model. model.predict() – It finds the output for the given test data. It is useful for checking the outputs qualitatively. Now, let’s see how to use keras models and layers to create a simple Neural Network. 4. Linear Regression Example https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 7/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV We will learn how to create a simple network with a single layer to perform linear regression (https://en.wikipedia.org/wiki/Linear_regression). We will use the Boston Housing dataset (https://keras.io/datasets/) available in Keras as an example. Samples contain 13 attributes of houses at different locations around the Boston suburbs in the late 1970s. Targets are the median values of the houses at a location (in k$). With the 13 features, we have to train the model which would predict the price of the house in the test data. 4.1. Training We use the Sequential model to create the network graph. Then we add a Dense layer with the number of inputs equal to the number of features in the data and a single output. Then we follow the workflow as explained in the previous section. We compile the model and train it using the fit command. Finally, we use the model.summary() function to check the configuration of the model. All keras datasets come with a load_data() function which returns tuples of training and testing data as shown in the code. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from keras.models import Sequential from keras.layers import Dense from keras.datasets import boston_housing (X_train, Y_train), (X_test, Y_test) = boston_housing.load_data() nFeatures = X_train.shape[1] model = Sequential() model.add(Dense(1, input_shape=(nFeatures,), activation='linear')) model.compile(optimizer='rmsprop', loss='mse', metrics=['mse', 'mae']) model.fit(X_train, Y_train, batch_size=4, epochs=1000) model.summary() The output of model.summary() is given below. It shows 14 parameters – 13 parameters for the weights and 1 for the bias. _______________________________________________________ Layer (type) Output Shape Param # ======================================================= https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 8/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV dense_1 (Dense) (None, 1) 14 ======================================================= Total params: 14 Trainable params: 14 Non-trainable params: 0 4.2. Inference After the model has been trained, we want to do inference on the test data. We can find the loss on the test data using the model.evaluate() function. We get the predictions on test data using the model.predict() function. Here we compare the ground truth values with the predictions from our model for the first 5 test samples. 1 2 3 4 5 6 model.evaluate(X_test, Y_test, verbose=True) Y_pred = model.predict(X_test) print Y_test[:5] print Y_pred[:5,0] The output is [ 7.2 18.8 [ 7.2 18.26 19. 21.38 27. 22.2] 29.28 23.72] It can be seen that the predictions follow the ground truth values, but there are some errors in the predictions. References https://keras.io (https://keras.io) Subscribe & Download Code If you liked this article and would like to download code (C++ and Python) and example images used in all the posts of this blog, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 9/10 10/27/2018 Deep learning using Keras - The Basics | Learn OpenCV Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/deep-learning-using-keras-the-basics/ 10/10 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV Learn OpenCV Understanding Feedforward Neural Networks OCTOBER 9, 2017 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) (/wp-content/uploads/2017/10/mlp-diagram.jpg) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-for-beginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deep-learningframeworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforward-neural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutional-neural-networks-inkeras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenet-models/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trained-models/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In this article, we will learn about feedforward Neural Networks, also known as Deep feedforward Networks or Multi-layer Perceptrons. They form the basis of many important Neural Networks being used in the recent times, such as Convolutional Neural Networks ( used extensively in computer vision applications ), Recurrent Neural Networks ( widely used in Natural https://www.learnopencv.com/understanding-feedforward-neural-networks/ 1/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV language understanding and sequence learning) and so on. We will try to understand the important concepts involved in an intuitive and interactive way, without going into the mathematics involved. If you are interested in diving into deep learning but don’t have much background in statistics and machine learning, then this article is a perfect starting point. We will use the feedforward network to solve a binary classification (https://en.wikipedia.org/wiki/Binary_classification) problem. In Machine Learning, Classification (https://en.wikipedia.org/wiki/Classification) is a type of Supervised Learning method, where the task is to divide the data samples into predefined groups by a Decision Function. When there are only two groups, it is called Binary Classification. The figure given below shows an example. The points in blue belong to one group ( or class ) and orange points belong to the other. The imaginary line(s) which separate the groups are called Decision Boundaries. The decision function is learned from a set of labeled samples, which is called Training Data and the process of learning the decision function is called Training. (/wp-content/uploads/2017/10/sample-data-mlp.jpg) In the above example, the top row shows two different data distributions and the bottom row shows the decision boundary. The left image shows an example of data which is Linearly Separable (https://en.wikipedia.org/wiki/Linear_separability). This means that a linear boundary ( e.g. a straight line ) is enough to separate the data into groups. On the other hand, the image on the right shows an example of data which is not linearly separable. The decision boundary, in this case, has to be circular or polygonal as shown in the figure. 1. Understanding the Neural Network Jargon Given below is an example of a feedforward Neural Network. It is a directed acyclic Graph which means that there are no feedback connections or loops in the network. It has an input layer, an output layer, and a hidden layer. In general, there can be multiple hidden layers. Each node in the layer is a Neuron, which can be thought of as the basic processing unit of a Neural Network. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 2/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/mlp-diagram.jpg) 1.1. What is a Neuron? An Artifical Neuron is the basic unit of a neural network. A schematic diagram of a neuron is given below. (/wp-content/uploads/2017/10/neuron-diagram.jpg) As seen above, It works in two steps – It calculates the weighted sum of its inputs and then applies an activation function to normalize the sum. The activation functions can be linear or nonlinear. Also, there are weights associated with each input of a neuron. These are the parameters which the network has to learn during the training phase. 1.2. Activation Functions The activation function is used as a decision making body at the output of a neuron. The neuron learns Linear or Non-linear https://www.learnopencv.com/understanding-feedforward-neural-networks/ 3/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV decision boundaries based on the activation function. It also has a normalizing effect on the neuron output which prevents the output of neurons after several layers to become very large, due to the cascading effect. There are three most widely used activation functions Sigmoid (https://en.wikipedia.org/wiki/Sigmoid_function) It maps the input ( x axis ) to values between 0 and 1. (/wp-content/uploads/2017/10/sigmoid.png) Tanh It is similar to the sigmoid function butmaps the input to values between -1 and 1. (/wp-content/uploads/2017/10/tanh.png) Rectified Linear Unit (ReLU) (https://en.wikipedia.org/wiki/Rectifier_(neural_networks)) It allows only positive values to pass through it. The negative values are mapped to zero. (/wp-content/uploads/2017/10/relu.png) There are other functions like the Unit Step function, leaky ReLU, Noisy ReLU, Exponential LU etc which have their own https://www.learnopencv.com/understanding-feedforward-neural-networks/ 4/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV merits and demerits. 1.3. Input Layer This is the first layer of a neural network. It is used to provide the input data or features to the network. 1.4. Output Layer This is the layer which gives out the predictions. The activation function to be used in this layer is different for different problems. For a binary classification problem, we want the output to be either 0 or 1. Thus, a sigmoid activation function is used. For a Multiclass classification problem, a Softmax (https://en.wikipedia.org/wiki/Softmax_function) ( think of it as a generalization of sigmoid to multiple classes ) is used. For a regression problem, where the output is not a predefined category, we can simply use a linear unit. 1.5. Hidden Layer A feedforward network applies a series of functions to the input. By having multiple hidden layers, we can compute complex functions by cascading simpler functions. Suppose, we want to compute the 7th power of a number, but want to keep things simple ( as they are easy to understand and implement ). You can use simpler powers like square and cube to calculate the higher order functions. Similarly, you can compute highly complex functions by this cascading effect. The most widely used hidden unit is the one which uses a Rectified Linear Unit (ReLU) as the activation function. The choice of hidden units is a very active research area in Machine Learning. The type of hidden layer distinguishes the different types of Neural Networks like CNNs, RNNs etc. The number of hidden layers is termed as the depth of the neural network. One question you might ask is exactly how many layers in a network make it deep? There is no right answer to this. In general, deeper networks can learn more complex functions. 1.6. How does the network learn? The training samples are passed through the network and the output obtained from the network is compared with the actual output. This error is used to change the weights of the neurons such that the error decreases gradually. This is done using the Backpropagation (http://neuralnetworksanddeeplearning.com/chap2.html) algorithm, also called backprop. Iteratively passing batches of data through the network and updating the weights, so that the error is decreased, is known as Stochastic Gradient Descent ( SGD ) (https://en.wikipedia.org/wiki/Stochastic_gradient_descent). The amount by which the weights are changed is determined by a parameter called Learning rate. The details of SGD and backprop will be covered in a separate post. 2. Why use Hidden Layers? To understand the significance of hidden layers we will try to solve the binary classification problem without hidden layers. For this, we will use an interactive platform from Google, playground.tensorflow.org (http://playground.tensorflow.org) which is a web app where you can create simple feedforward neural networks and see the effects of training in real time. You can play around by changing the number of hidden layers, number of units in a hidden layer, type of activation function, type of data, learning rate, regularization parameters etc. Given below is a screenshot of the web page. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 5/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/sample-playground-tensorflow.png) In the above page, you can select the data and click on the play button to start training. It will show you the learned decision boundary and the loss curves at the top right corner. 2.1. No hidden layer We want a network without a hidden layer which I have created in this link (http://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=gauss®Dataset=regplane&learningRate=0.1®ularizationRate=0&noise=0&networkShape=&seed=0.75972&showTestData=false&discretize=false Here there are no hidden layers so it becomes a simple neuron, which is capable of learning a linear decision boundary. We can select the type of data from the top left corner. In case of linearly separable data ( 3rd type ), it will be able to learn ( when you click the play button ) a linear boundary as shown below. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 6/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/linear-output.png) However, if you choose the 1st data it will not be able to learn the circular decision boundary. (/wp-content/uploads/2017/10/circular-output.png) Since the data lies in a circular region, one may say that using squared values of the features as inputs might help. As it turns out, upon training, the neuron will be able to find the circular decision boundary. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 7/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/circular-output-correct.png) Now, if you select the 2nd data, the same configuration will not be able to learn the appropriate decision boundary. (/wp-content/uploads/2017/10/parabola-output.png) Again by intuition, it looks like the decision boundary is a conic section( like a parabola or hyperbola ). So, if we include the product of the feature ( i.e. X1X2 ), the neuron is able to learn the desired decision boundary. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 8/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/parabola-output-correct.png) From the above experiment, we observed the following: Using a single neuron we can only learn a linear decision boundary We had to come up with feature transformations (like square of features or product of features) by visualizing the data. This step can be tricky for data which is not easy to visualize. 2.2. Adding a hidden layer By adding a hidden layer as shown in this link (http://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=xor®Dataset=regplane&learningRate=0.1®ularizationRate=0&noise=0&networkShape=3&seed=0.63075&showTestData=false&discretize=fals we can get rid of this feature engineering and have a single network which can learn all the three decision boundaries. A Neural Network with a single hidden layer with nonlinear activation functions is considered to be a Universal Function Approximator (https://en.wikipedia.org/wiki/Universal_approximation_theorem) ( i.e. capable of learning any function ). However, the number of units in the hidden layer is not fixed. The result of adding a hidden layer with just 3 neurons is shown below: https://www.learnopencv.com/understanding-feedforward-neural-networks/ 9/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/nonlinear-output-correct.png) 3. Regularization As we saw in the previous section, a multilayer network can learn nonlinear decision boundaries. However, if there is noise in the data (which is often the case) the network may try to learn the nonlinearity introduced by the noise too, trying to fit the noisy samples. In such cases, the noisy samples should be treated as outliers. In this link (http://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=gauss®Dataset=regplane&learningRate=0.1®ularizationRate=0.03&noise=30&networkShape=8&seed=0.46428&showTestData=false&discretize= I have added some noise to the linearly separable data. Also, to demonstrate the idea, I have increased the number of hidden units. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 10/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/overfitting-linear-data.png) In the above figure, it can be seen that the decision boundary is trying very hard to accommodate the noisy samples in order to reduce the error. But as you can see it is being misguided by the noisy samples. In other words, the network will be fragile in the presence of noise. This phenomenon is called Overfitting. In such cases, the error on training data might decrease but the network performs badly on unseen data. It can be seen from the loss curves at the top right corner. The training loss is decreasing but the test loss is increasing. Also, you can see that some weights have become very large ( very thick connections or you can see the weights if you hover above the connections ). This can be rectified by putting some restrictions on the values of weights ( like not allowing the weights to become very high ). This is called Regularization. We impose restrictions on the other parameters of the network. In a sense, we don’t trust the training data fully and want the network to learn “nice” decision boundaries. I have added L2 regularization to the above configuration in this link (http://playground.tensorflow.org/#activation=relu®ularization=L2&batchSize=10&dataset=gauss®Dataset=regplane&learningRate=0.1®ularizationRate=0.03&noise=30&networkShape=8&seed=0.46428&showTestData=false&discretize= and the output is shown below. https://www.learnopencv.com/understanding-feedforward-neural-networks/ 11/12 10/27/2018 Understanding Feedforward Neural Networks | Learn OpenCV (/wp-content/uploads/2017/10/regulariztion-output.png) After including L2 regularization, the decision boundary learned by the network is smoother and similar to the case when there was no noise. The effect of regularization can also be seen from the loss curves and the value of the weights. In the next post, we will learn how to implement a feedforward neural network in Keras for solving a multi-class classification problem and learn more about feedforward networks. Subscribe & Download Code If you liked this article and would like to download code (C++ and Python) and example images used in all posts of this blog, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/understanding-feedforward-neural-networks/ 12/12 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV Learn OpenCV Image Classification using Feedforward Neural Network in Keras OCTOBER 23, 2017 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trainedmodels/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In this article, we will learn how to implement a Feedforward Neural Network in Keras. We will use handwritten digit classification as an example to illustrate the effectiveness of a feedforward network. We will also see how to spot and overcome Overfitting during training. MNIST (https://en.wikipedia.org/wiki/MNIST_database) is a commonly used handwritten digit dataset consisting of 60,000 images in the training set and 10,000 images in the test set. So, each digit has 6000 images in the training set. The digits are size-normalized and centered in a fixed-size ( 28×28 ) image. The task is to train a machine learning algorithm to recognize a new sample from the test set correctly. 1. The Network For a quick understanding of Feedforward Neural Network, you can have a look at our previous article https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 1/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV (/understanding-feedforward-neural-networks/). We will use raw pixel values as input to the network. The images are matrices of size 28×28. So, we reshape the image matrix to an array of size 784 ( 28*28 ) and feed this array to the network. We will use a network with 2 hidden layers having 512 neurons each. The output layer will have 10 layers for the 10 digits. A schematic diagram is shown below. (/wp-content/uploads/2017/10/mlp-mnist-schematic.jpg) Check out this post (/deep-learning-using-keras-the-basics/) if you don’t have Keras installed yet! Also, download the code from the link below to follow along with the post. Download Code To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) Let us dive into the code! 2. Load the Data Keras comes with the MNIST data loader. It has a function mnist.load_data() which downloads the data from https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 2/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV its servers if it is not present on your computer. The data loaded using this function is divided into training and test sets. This is done by the following : 1 2 from keras.datasets import mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data() 3. Checkout the Data Let’s see how the data looks like. The data consists of handwritten numbers ranging from 0 to 9, along with their ground truth. It has 60,000 train samples and 10,000 test samples. Each sample is a 28×28 grayscale image. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from keras.utils import to_categorical print('Training data shape : ', train_images.shape, train_labels.shape) print('Testing data shape : ', test_images.shape, test_labels.shape) # Find the unique numbers from the train labels classes = np.unique(train_labels) nClasses = len(classes) print('Total number of outputs : ', nClasses) print('Output classes : ', classes) plt.figure(figsize=[10,5]) # Display the first image in training data plt.subplot(121) plt.imshow(train_images[0,:,:], cmap='gray') plt.title("Ground Truth : {}".format(train_labels[0])) # Display the first image in testing data plt.subplot(122) plt.imshow(test_images[0,:,:], cmap='gray') plt.title("Ground Truth : {}".format(test_labels[0])) Output: https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 3/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV (/wp-content/uploads/2017/10/sample-data-mnist.png) 4. Process the data The images are grayscale and the pixel values range from 0 to 255. We will apply the following preprocessing to the data before feeding it to the network. 1. Convert each image matrix ( 28×28 ) to an array ( 28*28 = 784 dimenstional ) which will be fed to the network as a single feature. 1 2 3 4 # Change from matrix to array of dimension 28x28 to array of dimention 784 dimData = np.prod(train_images.shape[1:]) train_data = train_images.reshape(train_images.shape[0], dimData) test_data = test_images.reshape(test_images.shape[0], dimData) 2. Convert the data to float and scale the values between 0 to 1. 1 2 3 4 5 6 7 # Change to float datatype train_data = train_data.astype('float32') test_data = test_data.astype('float32') # Scale the data to lie between 0 to 1 train_data /= 255 test_data /= 255 3. Convert the labels from integer to categorical ( one-hot ) encoding since that is the format required by Keras to perform multiclass classification. One-hot encoding is a type of boolean representation of integer data. It converts the integer to an array of all zeros except a 1 at the index of the integer. For example, using a one-hot encoding for 10 classes, the integer 5 will be encoded as 0000010000 1 2 3 4 5 6 7 # Change the labels from integer to categorical data train_labels_one_hot = to_categorical(train_labels) test_labels_one_hot = to_categorical(test_labels) # Display the change for category label using one-hot encoding print('Original label 0 : ', train_labels[0]) print('After conversion to categorical ( one-hot ) : ', train labels one hot[0]) https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 4/11 10/27/2018 7 Image Classification using MLP in Keras | Learn OpenCV print( After conversion to categorical ( one hot ) : , train_labels_one_hot[0]) Output: Original label 0 : 5 After conversion to categorical ( one-hot ) : [ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] 5. Keras Workflow for training the network We have described the Keras Workflow in our previous post (/deep-learning-using-keras-the-basics/). The block diagram is given here for reference. Basically, once you have the training and test data, you can follow these steps to train a neural network in Keras. (/wp-content/uploads/2017/09/keras-workflow.jpg) 5.1. Create the Network We had mentioned that we will be using a network with 2 hidden layers and an output layer with 10 units. The number of units in the hidden layers is kept to be 512. The input to the network is the 784-dimensional array converted from the 28×28 image. We will use the Sequential model for building the network. In the Sequential model, we can just stack up layers by adding the desired layer one by one. We use the Dense layer, also called fully connected layer since we are building a feedforward network in which all the neurons from one layer are connected to the https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 5/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV neurons in the previous layer. Apart from the Dense layer, we add the ReLU activation function which is required to introduce non-linearity to the model. This will help the network learn non-linear decision boundaries. The last layer is a softmax layer as it is a multiclass classification problem. For binary classification, we can use sigmoid. 1 2 3 4 5 6 7 from keras.models import Sequential from keras.layers import Dense model = Sequential() model.add(Dense(512, activation='relu', input_shape=(dimData,))) model.add(Dense(512, activation='relu')) model.add(Dense(nClasses, activation='softmax')) 5.2. Configure the Network In this step, we configure the optimizer to be rmsprop. We also specify the loss type which is categorical cross entropy which is used for multiclass classification. We also specify the metrics ( accuracy in this case ) which we want to track during the training process. You can also try using any other optimizer such as adam or SGD. 1 model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) 5.3. Train the Model The network is ready to get trained. This is done using the fit() function in Keras. We specify the number of epochs as 20. This means that the whole dataset will be fed to the network 20 times. We will be using the test data for validation. 1 2 history = model.fit(train_data, train_labels_one_hot, batch_size=256, epochs=20, verbose=1, validation_data=(test_data, test_labels_one_hot)) 5.4. Evaluate the trained model We check the performance on the whole test data using the evaluate() method. 1 2 [test_loss, test_acc] = model.evaluate(test_data, test_labels_one_hot) print("Evaluation result on Test Data : Loss = {}, accuracy = {}".format(test_loss, test_acc)) Output: Evaluation result on Test Data : Loss = 0.135059975359, accuracy = 0.9807 The results look good. However, we would want to have another look at the results. 6. Check for Overfitting The fit() function returns a history object which has a dictionary of all the metrics which were required to be https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 6/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV tracked during training. We can use the data in the history object to plot the loss and accuracy curves to check how the training process went. You can use the history.history.keys() function to check what metrics are present in the history. It should look like the following [‘acc’, ‘loss’, ‘val_acc’, ‘val_loss’] Let us plot the loss and accuracy curves. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 #Plot the Loss Curves plt.figure(figsize=[8,6]) plt.plot(history.history['loss'],'r',linewidth=3.0) plt.plot(history.history['val_loss'],'b',linewidth=3.0) plt.legend(['Training loss', 'Validation Loss'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.title('Loss Curves',fontsize=16) #Plot the Accuracy Curves plt.figure(figsize=[8,6]) plt.plot(history.history['acc'],'r',linewidth=3.0) plt.plot(history.history['val_acc'],'b',linewidth=3.0) plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Accuracy',fontsize=16) plt.title('Accuracy Curves',fontsize=16) (/wp-content/uploads/2017/10/loss-curve-without-reg.png) https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 7/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV (/wp-content/uploads/2017/10/acc-curve-without-reg.png) Although the accuracy obtained above is very good, if you see the loss and accuracy curves in the above figures, you’ll notice that the validation loss initially decrease, but then it starts increasing gradually. Also, there is a substantial difference between the training and test accuracy. This is a clear sign of Overfitting which means that the network has memorized the training data very well, but is not guaranteed to work on unseen data. Thus, the difference in the training and test accuracy. 7. Add Regularization to the model Overfitting occurs mainly because the network parameters are getting too biased towards the training data. We can add a dropout layer to overcome this problem to a certain extent. In case of dropout, a fraction of neurons is randomly turned off during the training process, reducing the dependency on the training set by some amount. 1 2 3 4 5 6 7 8 from keras.layers import Dropout model_reg = Sequential() model_reg.add(Dense(512, activation='relu', input_shape=(dimData,))) model_reg.add(Dropout(0.5)) model_reg.add(Dense(512, activation='relu')) model_reg.add(Dropout(0.5)) model_reg.add(Dense(nClasses, activation='softmax')) 8. Check performance after regularization We will train the network again in the same way we did earlier and check the loss and accuracy curves. 1 2 3 4 5 6 7 model_reg.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) history_reg = model_reg.fit(train_data, train_labels_one_hot, batch_size=256, epochs=20, verbose=1, validation_data=(test_data, test_labels_one_hot)) #Plot the Loss Curves plt.figure(figsize=[8,6]) plt.plot(history reg.history['loss'],'r',linewidth=3.0) https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 8/11 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Image Classification using MLP in Keras | Learn OpenCV plt.plot(history_reg.history[ loss ], r ,linewidth=3.0) plt.plot(history_reg.history['val_loss'],'b',linewidth=3.0) plt.legend(['Training loss', 'Validation Loss'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.title('Loss Curves',fontsize=16) #Plot the Accuracy Curves plt.figure(figsize=[8,6]) plt.plot(history_reg.history['acc'],'r',linewidth=3.0) plt.plot(history_reg.history['val_acc'],'b',linewidth=3.0) plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Accuracy',fontsize=16) plt.title('Accuracy Curves',fontsize=16) (/wp-content/uploads/2017/10/loss-curve-with-reg.png) (/wp-content/uploads/2017/10/acc-curve-with-reg.png) From the above loss and accuracy curves, we can observe that https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 9/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV The validation loss is not increasing The difference between the train and validation accuracy is not very high Thus, we can say that the model has better generalization capability as the performance does not decrease drastically in case of unseen data also. 9. Inference on a single image We have seen that the first image in the test set is the number 7. Let us see what the model predicts. 9.1. Getting the predicted class During the inference stage, it might be sufficient to know the class of the input data. It can be done as follows. 1 2 # Predict the most likely class model_reg.predict_classes(test_data[[0],:]) Output: array([7]) 9.2. Getting the probabilities In the above method there is no score which tells us about the confidence with which the model does the prediction. In some cases, for example when there are many classes, we may want the probabilities of the different classes which indicates how confident the model is about the occurence of a particular class. We can take the decision based on these scores. 1 2 # Predict the probabilities for each class model_reg.predict(test_data[[0],:]) Output: array([[ 1.46786899e-23, 1.73912635e-15, 3.05286026e-12, 3.48179753e-12, 2.16374247e-22, 3.82367185e-19, 2.31083363e-30, 1.00000000e+00, 2.78843536e-18, 1.55856298e-14]], dtype=float32) This gives the probability score for each class. We can see that the score for the 8th index is almost 1 which indicates that the predicted class is 7 with a confidence score of 1. https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 10/11 10/27/2018 Image Classification using MLP in Keras | Learn OpenCV 10. Exercise We had used 2 hidden layers and relu activation. Try to change the number of hidden layer and the activation to tanh or sigmoid and see what happens. Also change the dropout ratio and check the performance. Although the performance is pretty impressive with this model, we will see how to improve it further using a Convolutional Neural Network in the next post. Stay tuned! Subscribe & Download Code If you liked this article and would like to download code and example images used in this post, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/image-classification-using-feedforward-neural-network-in-keras/ 11/11 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV Learn OpenCV Image Classification using Convolutional Neural Networks in Keras NOVEMBER 29, 2017 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) (https://www.learnopencv.com/wp-content/uploads/2017/11/cnn-schema1.jpg) In this tutorial, we will learn the basics of Convolutional Neural Networks ( CNNs ) and how to use them for an Image Classification task. We will also see how data augmentation helps in improving the performance of the network. We discussed Feedforward Neural Networks (/understanding-feedforward-neural-networks/), Activation Functions (/understanding-activation-functions-in-deep-learning/), and Basics of Keras (/deeplearning-using-keras-the-basics/) in the previous tutorials. We will use the MNIST and CIFAR10 datasets for illustrating various concepts. This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforward- https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 1/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV neural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trainedmodels/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . 1. Motivation In our previous article on Image Classification (/image-classification-using-feedforward-neural-network-inkeras/), we used a Multilayer Perceptron on the MNIST digits dataset. The performance was pretty good as we achieved 98.3% accuracy on test data. But there was a problem with that approach. In our training dataset, all images are centered. If the images in the test set are off-center, then the MLP approach fails miserably. We want the network to be Translation-Invariant. Given below is an example of the number 7 being pushed to the top-left and bottom-right. The classifier predicts it correctly for the centered image but fails in the other two cases. To make it work for these images, either we have to train separate MLPs for different locations or we have to make sure that we have all these variations in the training set as well, which I would say is difficult, if not impossible. (/wp-content/uploads/2017/11/failure-mlp-mnist.jpg) The Fully connected network tries to learn global features or patterns. It acts as a good classifier. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 2/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV Another major problem with a fully connected classifier is that the number of parameters increases very fast since each node in layer L is connected to a node in layer L-1. So it is not feasible to design very deep networks using an MLP structure alone. Both the above problems are solved to a great extent by using Convolutional Neural Networks which we will see in the next section. We will first describe the concepts involved in a Convolutional Neural Network in brief and then see an implementation of CNN in Keras so that you get a hands-on experience. 2. Convolutional Neural Network Convolutional Neural Networks are a form of Feedforward Neural Networks (/understanding-feedforwardneural-networks/). Given below is a schema of a typical CNN. The first part consists of Convolutional and max-pooling layers which act as the feature extractor. The second part consists of the fully connected layer which performs non-linear transformations of the extracted features and acts as the classifier. (https://www.learnopencv.com/wp-content/uploads/2017/11/cnn-schema1.jpg) In the above diagram, the input is fed to the network of stacked Conv, Pool and Dense layers. The output can be a softmax layer indicating whether there is a cat or something else. You can also have a sigmoid layer to give you a probability of the image being a cat. Let us see the two layers in detail. 2.1. Convolutional Layer The convolutional layer can be thought of as the eyes of the CNN. The neurons in this layer look for specific https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 3/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV features. If they find the features they are looking for, they produce a high activation. Convolution can be thought of as a weighted sum between two signals ( in terms of signal processing jargon ) or functions ( in terms of mathematics ). In image processing, to calculate convolution at a particular location , we extract x sized chunk from the image centered at location values in this chunk element-by-element with the convolution filter (also sized obtain a single output. That’s it! Note that . We then multiply the x ) and then add them all to is termed as the kernel size. An example of convolution operation on a matrix of size 5×5 with a kernel of size 3×3 is shown below : (/wp-content/uploads/2017/11/convolution-example-matrix.gif) The convolution kernel is slid over the entire matrix to obtain an activation map. Let’s look at a concrete example and understand the terms. Suppose, the input image is of size 32x32x3. This is nothing but a 3D array of depth 3. Any convolution filter we define at this layer must have a depth equal to the depth of the input. So we can choose convolution filters of depth 3 ( e.g. 3x3x3 or 5x5x3 or 7x7x3 etc.). Let’s pick a convolution filter of size 3x3x3. So, referring to the above example, here the convolutional kernel will be a cube instead of a square. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 4/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV (/wp-content/uploads/2017/11/convolution-demo-diagram.jpg) If we can perform the convolution operation by sliding the 3x3x3 filter over the entire 32x32x3 sized image, we will obtain an output image of size 30x30x1. This is because the convolution operation is not defined for a strip 2 pixels wide around the image. We have to ensure the filter is always inside the image. So 1 pixel is stripped away from left, right, top and bottom of the image. The same filters are slid over the entire image to find the relevant features. This makes the CNNs Translation Invariant. 2.1.1. Activation Maps For a 32x32x3 input image and filter size of 3x3x3, we have 30x30x1 locations and there is a neuron corresponding to each location. Then 30x30x1 outputs or activations of all neurons are called the activation maps. The activation map of one layer serves as the input to the next layer. 2.1.2. Shared weights and biases In our example, there are 30×30 = 900 neurons because there are that many locations where the 3x3x3 filter can be applied. Unlike traditional neural nets where weights and biases of neurons are independent of each other, in case of CNNs the neurons corresponding to one filter in a layer share the same weights and biases. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 5/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV 2.1.3. Stride In the above case, we slid the window by 1 pixel at a time. We can also slide the window by more than 1 pixel. This number is called the stride. 2.1.4. Multiple Filters Typically, we use more than 1 filter in one convolution layer. If we use 32 filters we will have an activation map of size 30x30x32. Please refer to Figure below for a graphical view. Note that all neurons associated with the same filter share the same weights and biases. So the number of weights while using 32 filters is simply 3x3x3x32 = 288 and the number of biases is 32. The 32 Activation maps obtained from applying the convoltional Kernels is shown below. (/wp-content/uploads/2017/11/activation-maps-32-kernel.jpg) 2.1.5. Zero padding As you can see, after each convolution, the output reduces in size (as in this case we are going from 32×32 to 30×30). For convenience, it’s a standard practice to pad zeros to the boundary of the input layer such that the output is the same size as input layer. So, in this example, if we add a padding of size 1 on both sides of https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 6/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV the input layer, the size of the output layer will be 32x32x32 which makes implementation simpler as well. Let’s say you have an input of size size x , a filter of size and you are using stride is added to the input image. Then, the output will be of size x and a zero padding of where, We can calculate the padding required so that the input and the output dimensions are the same by setting in the above equation and solving for P. 2.2. CNNs learn Hierarchical features Let’s discuss how CNNs learn hierarchical features. (/wp-content/uploads/2017/11/cnn-hierarchical-features.jpg) In the above figure, the big squares indicate the region over which the convolution operation is performed and the small squares indicate the output of the operation which is just a number. The following observations are to be noted : In the first layer, the square marked 1 is obtained from the area in the image where the leaves are painted. In the second layer, the square marked 2 is obtained from the bigger square in Layer 1. The numbers in https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 7/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV this square are obtained from multiple regions from the input image. Specifically, the whole area around the left ear of the cat is responsible for the value at the square marked 2. Similarly, in the third layer, this cascading effect results in the square marked 3 being obtained from a large region around the leg area. We can say from the above that the initial layers are looking at smaller regions of the image and thus can only learn simple features like edges / corners etc. As we go deeper into the network, the neurons get information from larger parts of the image and from various other neurons. Thus, the neurons at the later layers can learn more complicated features like eyes / legs and what not! 2.3. Max Pooling Layer Pooling layer is mostly used immediately after the convolutional layer to reduce the spatial size (only width and height, not depth). This reduces the number of parameters, hence computation is reduced. Using fewer parameters avoids overfitting. Note: Overfitting is the condition when a trained model works very well on training data, but does not work very well on test data. The most common form of pooling is Max pooling where we take a filter of size and apply the maximum operation over the sized part of the image. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 8/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV (/wp-content/uploads/2017/11/max-pooling-demo.jpg) Figure : Max pool layer with filter size 2×2 and stride 2 is shown. The output is the max value in a 2×2 region shown using encircled digits. The most common pooling operation is done with the filter of size 2×2 with a stride of 2. It essentially reduces the size of input by half. Now let’s take a break from the theoretical discussion and jump into the implementation of a CNN. 3. Implementing CNNs in Keras Download Code To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) 3.1. The Dataset – CIFAR10 The CIFAR10 dataset comes bundled with Keras. It has 50,000 training images and 10,000 test images. There are 10 classes like airplanes, automobiles, birds, cats, deer, dog, frog, horse, ship and truck. The images are of size 32×32. Given below are a few examples. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 9/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV (/wp-content/uploads/2017/11/cifar10-display-images.png) Image Credit : Alex Krizhevsky 3.2. The Network For implementing a CNN, we will stack up Convolutional Layers, followed by Max Pooling layers. We will also include Dropout to avoid overfitting. Finally, we will add a fully connected ( Dense ) layer followed by a softmax layer. Given below is the model structure. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from keras.models import Sequential from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten def createModel(): model = Sequential() model.add(Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=input_shape)) model.add(Conv2D(32, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Conv2D(64, (3, 3), padding='same', activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(512, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(nClasses, activation='softmax')) return model In the above code, we use 6 convolutional layers and 1 fully-connected layer. Line 6 and 7 adds convolutional layers with 32 filters / kernels with a window size of 3×3. Similarly, in line 10, we add a conv layer with 64 filters. In line 8, we add a max pooling layer with window size 2×2. In line 9, we add a dropout https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 10/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV layer with a dropout ratio of 0.25. In the final lines, we add the dense layer which performs the classification among 10 classes using a softmax layer. If we check the model summary we can see the shapes of each layer. (/wp-content/uploads/2017/11/keras-cnn-cifar-model-summary.png) It shows that since we have used padding in the first layer, the output shape is same as the input ( 32×32 ). But the second conv layer shrinks by 2 pixels in both dimensions. Also, the output size after pooling layer decreases by half since we have used a stride of 2 and a window size of 2×2. The final droupout layer has an output of 2x2x64. This has to be converted to a single array. This is done by the flatten layer which converts the 3D array into a 1D array of size 2x2x64 = 256. The final layer has 10 nodes since there are 10 classes. 3.3. Training the network For training the network, we will follow the simple workflow of create -> compile -> fit described here (/deep- https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 11/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV learning-using-keras-the-basics/). Since it is a 10 class classification problem, we will use a categorical cross entropy loss and use RMSProp optimizer to train the network. We will run it for some number of epochs. Here we run it for 100 epochs. 1 2 3 4 5 6 7 8 9 model1 = createModel() batch_size = 256 epochs = 100 model1.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) history = model1.fit(train_data, train_labels_one_hot, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(test_data, test_labels_one_hot)) model1.evaluate(test_data, test_labels_one_hot) 3.4. Loss & Accuracy Curves Given below are the loss and accuracy curves. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # Loss Curves plt.figure(figsize=[8,6]) plt.plot(history.history['loss'],'r',linewidth=3.0) plt.plot(history.history['val_loss'],'b',linewidth=3.0) plt.legend(['Training loss', 'Validation Loss'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.title('Loss Curves',fontsize=16) # Accuracy Curves plt.figure(figsize=[8,6]) plt.plot(history.history['acc'],'r',linewidth=3.0) plt.plot(history.history['val_acc'],'b',linewidth=3.0) plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Accuracy',fontsize=16) plt.title('Accuracy Curves',fontsize=16) https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 12/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV (/wp-content/uploads/2017/11/cnn-keras-curves-without-aug.jpg) From the above curves, we can see that there is a considerable difference between the training and validation loss. This indicates that the network has tried to memorize the training data and thus, is able to get better accuracy on it. This is a sign of Overfitting. But we have already used Dropout in the network, then why is it still overfitting. Let us see if we can further reduce overfitting using something else. 4. Using Data Augmentation One of the major reasons for overfitting is that you don’t have enough data to train your network. Apart from regularization, another very effective way to counter Overfitting is Data Augmentation. It is the process of artificially creating more images from the images you already have by changing the size, orientation etc of the image. It can be a tedious task but fortunately, this can be done in Keras using the ImageDataGenerator instance. 1 2 3 4 5 6 7 8 9 10 from keras.preprocessing.image import ImageDataGenerator ImageDataGenerator( rotation_range=10., width_shift_range=0.1, height_shift_range=0.1, shear_range=0., zoom_range=.1., horizontal_flip=True, vertical_flip=True) In the above code, we have provided some of the operations that can be done using the ImageDataGenerator for data augmentation. This includes rotation of the image, shifting the image left/right/top/bottom by some amount, flip the image horizontally or vertically, shear or zoom the image etc. https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 13/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV For the complete list, check the documentation (https://keras.io/preprocessing/image/). Some generated images are shown below. (/wp-content/uploads/2017/11/data-aug.png) 4.1. Training with Data Augmentation Similar to the previous section, we will create the model, but use data augmentation while training. We will use ImageDataGenerator for creating a generator which will feed the network. 1 2 3 4 5 6 7 from keras.preprocessing.image import ImageDataGenerator model2 = createModel() model2.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) batch size = 256 https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 14/17 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Image Classification using CNNs in Keras | Learn OpenCV batch_size = 256 epochs = 100 datagen = ImageDataGenerator( # zoom_range=0.2, # randomly zoom into images # rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180) width_shift_range=0.1, # randomly shift images horizontally (fraction of total width) height_shift_range=0.1, # randomly shift images vertically (fraction of total height) horizontal_flip=True, # randomly flip images vertical_flip=False) # randomly flip images # Fit the model on the batches generated by datagen.flow(). history2 = model2.fit_generator(datagen.flow(train_data, train_labels_one_hot, batch_size=batch_size), steps_per_epoch=int(np.ceil(train_data.shape[0] / float(batch_size))), epochs=epochs, validation_data=(test_data, test_labels_one_hot), workers=4) model2.evaluate(test_data, test_labels_one_hot) In the above code, 1. We first create the model and configure it. 2. Then we create an ImageDataGenerator object and configure it using parameters for horizontal flip, and image translation. 3. The datagen.flow() function generates batches of data, after performing the data transformations / augmentation specified during the instantiation of the data generator. 4. The fit_generator function will train the model using the data obtained in batches from the datagen.flow function. 4.2. Loss & Accuracy Curves 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 # Loss Curves plt.figure(figsize=[8,6]) plt.plot(history2.history['loss'],'r',linewidth=3.0) plt.plot(history2.history['val_loss'],'b',linewidth=3.0) plt.legend(['Training loss', 'Validation Loss'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.title('Loss Curves',fontsize=16) # Accuracy Curves plt.figure(figsize=[8,6]) plt.plot(history2.history['acc'],'r',linewidth=3.0) plt.plot(history2.history['val_acc'],'b',linewidth=3.0) plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18) plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Accuracy',fontsize=16) plt.title('Accuracy Curves',fontsize=16) https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 15/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV (/wp-content/uploads/2017/11/cnn-keras-curves-with-aug.jpg) The test accuracy is greater than training accuracy. This means that the model has generalized very well. This comes from the fact that the model has been trained on much worse data ( for example – flipped images ), so it is finding the normal test data easier to classify. 5. What next? It looks like there were a lot of parameters to chose from and then training took a long time. We would not want to get tied down with these two problems when we are working on simple problems. Many researchers working in this field very generously open-source their trained models which have been trained on millions of images and for hundreds of hours on many GPUs. We can leverage their models and try to use their trained models as the starting point rather than starting from scratch. We will learn how to do Transfer Learning and Fine-tuning in our next post. References https://github.com/fchollet/keras/blob/master/examples (https://github.com/fchollet/keras/blob/master/examples) Subscribe & Download Code If you liked this article and would like to download code and example images used in this post, please https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 16/17 10/27/2018 Image Classification using CNNs in Keras | Learn OpenCV subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/image-classification-using-convolutional-neural-networks-in-keras/ 17/17 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV Learn OpenCV Understanding Activation Functions in Deep Learning OCTOBER 30, 2017 BY ADITYA SHARMA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/ADITYASHARMA/) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installingdeep-learning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflowpython/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pretrained-models/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In this post, we will learn about different kinds of activation functions; we will also see which activation function is better than the other. This post assumes that you have a basic idea of Artificial Neural Networks (ANN), but in case you don’t, I recommend you first read the post on understanding feedforward neural networks (/understanding-feedforward-neural-networks/). 1. What is an Activation Function? Biological neural networks inspired the development of artificial neural networks. However, ANNs are not even an approximate representation of how the brain works. It is still useful to understand the relevance of an activation function in a biological neural network before we know as to why we use it in https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 1/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV an artificial neural network. A typical neuron has a physical structure that consists of a cell body, an axon that sends messages to other neurons, and dendrites that receives signals or information from other neurons. (/wpcontent/uploads/2017/10/biologicalneural-network.jpg) Biological Neural Network (Image Credit (https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_neural_networks In the above picture, the red circle indicates the region where the two neurons communicate. The neuron receives signals from other neurons through the dendrites. The weight (strength) associated with a dendrite, called synaptic weights, gets multiplied by the incoming signal. The signals from the dendrites are accumulated in the cell body, and if the strength of the resulting signal is above a certain threshold, the neuron passes the message to the axon. Otherwise, the signal is killed by the neuron and is not propagated further. The activation function takes the decision of whether or not to pass the signal. In this case, it is a simple step function with a single parameter – the threshold. Now, when we learn something new ( or unlearn something ), the threshold and the synaptic weights of some neurons change. This creates new connections among neurons making the brain learn new things. Let us understand the same concept again but this time using an artificial neuron. https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 2/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV (/wp-content/uploads/2017/10/neuron-diagram.jpg) In the above figure, is the signal vector that gets multiplied with the weights . This is followed by accumulation ( i.e. summation + addition of bias ). Finally, an activation function is applied to this sum. Note that the weights and the bias transform the input signal linearly. The activation, on the other hand, transforms the signal non-linearly and it is this non-linearity that allows us to learn arbitrarily complex transformations between the input and the output. Over the years, various functions have been used, and it is still an active area of research to find a proper activation function that makes the neural network learn better and faster. 2. How does the network learn? It is essential to get a basic idea of how the neural network learns. Let’s say that the desired output of the network is . The network produces an output . The difference between the predicted output and the desired output is converted to a metric known as the loss function ( ). The loss is high when the neural network makes a lot of mistakes, and it is low when it makes fewer mistakes. The goal of the training process is to find the weights and bias that minimise the loss function over the training set. In the figure below, the loss function is shaped like a bowl. At any point in the training process, the partial derivatives of the loss function w.r.t to the weights is nothing but the slope of the bowl at that location. One can see that by moving in the direction predicted by the partial derivatives, we can reach https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 3/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV the bottom of the bowl and therefore minimize the loss function. This idea of using the partial derivatives of a function to iteratively find its local minimum is called the gradient descent. (/wp-content/uploads/2017/10/gradient-descent-2d-diagram.png) In Artificial neural networks the weights are updated using a method called Backpropagation. The partial derivatives of the loss function w.r.t the weights are used to update the weights. In a sense, the error is backpropagated in the network using derivatives. This is done in an iterative manner and after many iterations, the loss reaches a minimum value, and the derivative of the loss becomes zero. We plan to cover backpropagation in a separate blog post. The main thing to note here is the presence of derivatives in the training process. 3. Types of Activation Functions Linear Activation Function: It is a simple linear function of the form Basically, the input passes to the output without any modification. https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 4/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV (/wp-content/uploads/2017/10/linearactivation-function-1.png) Figure : Linear Activation Function Non-Linear Activation Functions: These functions are used to separate the data that is not linearly separable and are the most used activation functions.A non-linear equation governs the mapping from inputs to outputs. Few examples of different types of non-linear activation functions are sigmoid, tanh, relu, lrelu, prelu, swish, etc. We will be discussing all these activation functions in detail. (/wpcontent/uploads/2017/10/nonlinear-activation-function.png) Figure: Non-Linear Activation Function 4. Why do we need a non-linear activation function in an artificial neural network? Neural networks are used to implement complex functions, and non-linear activation functions enable them to approximate arbitrarily complex functions. Without the non-linearity introduced by the activation function, multiple layers of a neural network are equivalent to a single layer neural network. https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 5/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV Let’s see a simple example to understand why without non-linearity it is impossible to approximate even simple functions like XOR and XNOR gate. In the figure below, we graphically show an XOR gate. There are two classes in our dataset represented by a cross and a circle. When the two features, and are the same, the class label is a red cross, otherwise, it is a blue circle. The two red crosses have an output of 0 for input value (0,0) and (1,1) and the two blue rings have an output of 1 for input value (0,1) and (1,0). (/wpcontent/uploads/2017/10/xor.png) Figure: Graphical Representation of XOR gate From the above picture, we can see that the data points are not linearly separable. In other words, we can not draw a straight line to separate the blue circles and the red crosses from each other. Hence, we will need a non-linear decision boundary to separate them. The activation function is also crucial for squashing the output of the neural network to be within certain bounds. The output of a neuron can take on very large values. This output, when fed to the next layer neuron without modification, can be transformed to even larger numbers thus making the process computationally intractable. One of the tasks of the activation function is to map the output of a neuron to something that is bounded ( e.g., between 0 and 1). With this background, we are ready to understand different types of activation functions. 5. Types of Non-Linear Activation Functions 5.1. Sigmoid It is also known as Logistic Activation Function. It takes a real-valued number and squashes it into a https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 6/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV range between 0 and 1. It is also used in the output layer where our end goal is to predict probability. It converts large negative numbers to 0 and large positive numbers to 1. Mathematically it is represented as The figure below shows the sigmoid function and its derivative graphically (/wp-content/uploads/2017/10/sigmoidactivation-function.png) Figure: Sigmoid Activation Function (/wp-content/uploads/2017/10/sigmoidderivative.png) Figure: Sigmoid Derivative The three major drawbacks of sigmoid are: 1. Vanishing gradients: Notice, the sigmoid function is flat near 0 and 1. In other words, the gradient https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 7/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV of the sigmoid is 0 near 0 and 1. During backpropagation through the network with sigmoid activation, the gradients in neurons whose output is near 0 or 1 are nearly 0. These neurons are called saturated neurons. Thus, the weights in these neurons do not update. Not only that, the weights of neurons connected to such neurons are also slowly updated. This problem is also known as vanishing gradient. So, imagine if there was a large network comprising of sigmoid neurons in which many of them are in a saturated regime, then the network will not be able to backpropagate. 2. Not zero centered: Sigmoid outputs are not zero-centered. 3. Computationally expensive: The exp() function is computationally expensive compared with the other non-linear activation functions. The next non-linear activation function that I am going to discuss addresses the zero-centered problem in sigmoid. 5.2. Tanh (/wp-content/uploads/2017/10/tanh-1.png) Figure: Tanh Activation Function https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 8/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV (/wp-content/uploads/2017/10/tanhderivative.png) Figure: Tanh Derivative It is also known as the hyperbolic tangent activation function. Similar to sigmoid, tanh also takes a realvalued number but squashes it into a range between -1 and 1. Unlike sigmoid, tanh outputs are zerocentered since the scope is between -1 and 1. You can think of a tanh function as two sigmoids put together. In practice, tanh is preferable over sigmoid. The negative inputs considered as strongly negative, zero input values mapped near zero, and the positive inputs regarded as positive. The only drawback of tanh is: 1. The tanh function also suffers from the vanishing gradient problem and therefore kills gradients when saturated. To address the vanishing gradient problem, let us discuss another non-linear activation function known as the rectified linear unit (ReLU) which is a lot better than the previous two activation functions and is most widely used these days. 5.3. Rectified Linear Unit (ReLU) https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 9/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV (/wp-content/uploads/2017/10/reluactivation-function-1.png) Figure: ReLU Activation Function (/wp-content/uploads/2017/10/reluderivative.png) Figure: ReLU Derivative ReLU is half-rectified from the bottom as you can see from the figure above. Mathematically, it is given by this simple expression This means that when the input x < 0 the output is 0 and if x > 0 the output is x. This activation makes the network converge much faster. It does not saturate which means it is resistant to the vanishing gradient problem at least in the positive region ( when x > 0), so the neurons do not backpropagate all zeros at least in half of their regions. ReLU is computationally very efficient because it is implemented using simple thresholding. But there are few drawbacks of ReLU neuron : 1. Not zero-centered: The outputs are not zero centered similar to the sigmoid activation function. 2. The other issue with ReLU is that if x < 0 during the forward pass, the neuron remains inactive and it kills the gradient during the backward pass. Thus weights do not get updated, and the network does not learn. When x = 0 the slope is undefined at that point, but this problem is taken care of https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 10/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV during implementation by picking either the left or the right gradient. To address the vanishing gradient issue in ReLU activation function when x < 0 we have something called Leaky ReLU which was an attempt to fix the dead ReLU problem. Let’s understand leaky ReLU in detail. 5.4. Leaky ReLU (/wp-content/uploads/2017/10/leaky-reluactivation.png) Figure : Leaky ReLU activation function This was an attempt to mitigate the dying ReLU problem. The function computes The concept of leaky ReLU is when x < 0, it will have a small positive slope of 0.1. This function somewhat eliminates the dying ReLU problem, but the results achieved with it are not consistent. Though it has all the characteristics of a ReLU activation function, i.e., computationally efficient, converges much faster, does not saturate in positive region. The idea of leaky ReLU can be extended even further. Instead of multiplying x with a constant term we can multiply it with a hyperparameter which seems to work better the leaky ReLU. This extension to leaky ReLU is known as Parametric ReLU. 5.5. Parametric ReLU The PReLU function is given by https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 11/13 10/27/2018 Where Understanding Activation Functions in Deep Learning | Learn OpenCV is a hyperparameter. The idea here was to introduce an arbitrary hyperparameter , and this can be learned since you can backpropagate into it. This gives the neurons the ability to choose what slope is best in the negative region, and with this ability, they can become a ReLU or a leaky ReLU. In summary, it is better to use ReLU, but you can experiment with Leaky ReLU or Parametric ReLU to see if they give better results for your problem 5.6. SWISH (/wp-content/uploads/2017/10/swish.png) Figure: SWISH Activation Function Also known as a self-gated activation function, has recently been released by researchers at Google. Mathematically it is represented as According to the paper (https://arxiv.org/abs/1710.05941v1), the SWISH activation function performs better than ReLU From the above figure, we can observe that in the negative region of the x-axis the shape of the tail is different from the ReLU activation function and because of this the output from the Swish activation function may decrease even when the input value increases. Most activation functions are monotonic, i.e., their value never decreases as the input increases. Swish has one-sided boundedness property at zero, it is smooth and is non-monotonic. It will be interesting to see how well it performs by changing just one line of code. https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 12/13 10/27/2018 Understanding Activation Functions in Deep Learning | Learn OpenCV Subscribe & Download Code If you liked this article and would like to download code (C++ and Python) and example images used in other posts of this blog, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/understanding-activation-functions-in-deep-learning/ 13/13 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV Learn OpenCV Understanding Autoencoders using Tensorflow (Python) NOVEMBER 15, 2017 BY ADITYA SHARMA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/ADITYASHARMA/) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trainedmodels/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In this article, we will learn about autoencoders in deep learning. We will show a practical implementation of using a Denoising Autoencoder on the MNIST (https://en.wikipedia.org/wiki/MNIST_database) handwritten digits dataset as an example. In addition, we are sharing an implementation of the idea in Tensorflow. 1. What is an autoencoder? An autoencoder is an unsupervised machine learning algorithm that takes an image as input and reconstructs it using fewer number of bits. That may sound like image compression, but the biggest difference between an autoencoder and a general purpose image compression algorithms is that in case of autoencoders, the compression is achieved by learning on a training set of data. While reasonable compression is achieved when an image is similar to the training set used, autoencoders are poor generalpurpose image compressors; JPEG compression will do vastly better. https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 1/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV Autoencoders are similar in spirit to dimensionality reduction techniques like principal component analysis (https://en.wikipedia.org/wiki/Principal_component_analysis). They create a space where the essential parts of the data are preserved, while non-essential ( or noisy ) parts are removed. There are two parts to an autoencoder 1. Encoder: This is the part of the network that compresses the input into a fewer number of bits. The space represented by these fewer number of bits is called the “latent-space” and the point of maximum compression is called the bottleneck. These compressed bits that represent the original input are together called an “encoding” of the input. 2. Decoder: This is the part of the network that reconstructs the input image using the encoding of the image. Let’s look at an example to understand the concept better. (/wpcontent/uploads/2017/11/AutoEncoder.png) Figure: 2-layer Autoencoder In the above picture, we show a vanilla autoencoder — a 2-layer autoencoder with one hidden layer. The input and output layers have the same number of neurons. We feed five real values into the autoencoder which is compressed by the encoder into three real values at the bottleneck (middle layer). Using these three real values, the decoder tries to reconstruct the five real values which we had fed as an input to the network. In practice, there are a far larger number of hidden layers in between the input and the output. There are various kinds of autoencoders like sparse autoencoder, variational autoencoder, and denoising autoencoder. In this post, we will learn about a denoising autoencoder. 2. Denoising Autoencoder https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 2/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV Figure: Denoising Autoencoder The idea behind a denoising autoencoder is to learn a representation (latent space) that is robust to noise. We add noise to an image and then feed this noisy image as an input to our network. The encoder part of the autoencoder transforms the image into a different space that preserves the handwritten digits but removes the noise. As we will see later, the original image is 28 x 28 x 1 image, and the transformed image is 7 x 7 x 32. You can think of the 7 x 7 x 32 image as a 7 x 7 image with 32 color channels. The decoder part of the network then reconstructs the original image from this 7 x 7 x 32 image and voila the noise is gone! How does this magic happen? During training, we define a loss (cost function) to minimize the difference between the reconstructed image and the original noise-free image. In other words, we learn a 7 x 7 x 32 space that is noise free. Download Code To easily follow along this tutorial, please download the iPython notebook code by clicking on the button below. It’s FREE! DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) 3. Implementation of Denoising Autoencoder This implementation is inspired by this excellent post Building Autoencoders in Keras (https://blog.keras.io/building-autoencoders-in-keras.html). 3.1 The Network The images are matrices of size 28 x 28. We reshape the image to be of size 28 x 28 x 1, convert the resized https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 3/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV image matrix to an array, rescale it between 0 and 1, and feed this as an input to the network. The encoder transforms the 28 x 28 x 1 image to a 7 x 7 x 32 image. You can think of this 7 x 7 x 32 image as a point in a 1568 ( because 7 x 7 x 32 = 1568 ) dimensional space. This 1568 dimensional space is called the bottleneck or the latent space. The architecture is graphically shown below. Figure: Architecture of Encoder Model The decoder does the exact opposite of an encoder; it transforms this 1568 dimensional vector back to a 28 x 28 x 1 image. We call this output image a “reconstruction” of the original image. The structure of the decoder is shown below. https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 4/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV Figure: Architecture of Decoder Model Let’s dive into the implementation of an autoencoder using tensorflow. 3.2 Encoder The encoder has two convolutional layers and two max pooling layers. Both Convolution layer-1 and Convolution layer-2 have 32-3 x 3 filters. There are two max-pooling layers each of size 2 x 2. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 #Encoder with tf.name_scope('en-convolutions'): conv1 = tf.layers.conv2d(inputs_,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=lr # Now 28x28x32 with tf.name_scope('en-pooling'): maxpool1 = tf.layers.max_pooling2d(conv1,pool_size=(2,2),strides=(2,2),name='pool1') # Now 14x14x32 with tf.name_scope('en-convolutions'): conv2 = tf.layers.conv2d(maxpool1,filters=32,kernel_size=(3,3),strides=(1,1),padding='SAME',use_bias=True,activation=l # Now 14x14x32 with tf.name_scope('encoding'): encoded = tf.layers.max_pooling2d(conv2,pool_size=(2,2),strides=(2,2),name='encoding') # Now 7x7x32. #latent space Figure: Encoder Block Diagram 3.3 Decoder The decoder has two Conv2d_transpose layers, two Convolution layers, and one Sigmoid activation function. https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 5/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV Conv2d_transpose is for upsampling which is opposite to the role of a convolution layer. The Conv2d_transpose layer upsamples the compressed image by two times each time we use it. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Decoder with tf.name_scope('decoder'): conv3 = tf.layers.conv2d(encoded,filters=32,kernel_size=(3,3),strides=(1,1),name='conv3',padding='SAME',use_bias=True, #Now 7x7x32 upsample1 = tf.layers.conv2d_transpose(conv3,filters=32,kernel_size=3,padding='same',strides=2,name='upsample1') # Now 14x14x32 upsample2 = tf.layers.conv2d_transpose(upsample1,filters=32,kernel_size=3,padding='same',strides=2,name='upsample2') # Now 28x28x32 logits = tf.layers.conv2d(upsample2,filters=1,kernel_size=(3,3),strides=(1,1),name='logits',padding='SAME',use_bias=Tr #Now 28x28x1 # Pass logits through sigmoid to get denoisy image decoded = tf.sigmoid(logits,name='recon') Figure: Decoder Block Diagram Finally, we calculate the loss of the output using cross-entropy (https://en.wikipedia.org/wiki/Cross_entropy) loss function and use Adam optimizer (https://machinelearningmastery.com/adam-optimization-algorithm-fordeep-learning/) to optimize our loss function. 3.4 Why do we use a leaky ReLU and not a ReLU as an activation function? We want gradients to flow while we backpropagate through the network. We stack many layers in a system in which there are some neurons whose value drop to zero or become negative. Using a ReLU as an activation function clips the negative values to zero and in the backward pass, the gradients do not flow through those neurons where the values become zero. Because of this the weights do not get updated, and the network stops learning for those values. So using ReLU is not always a good idea. However, we encourage you to change the activation function to ReLU and see the difference. 1 2 def lrelu(x,alpha=0.1): return tf.maximum(alpha*x,x) Therefore, we use a leaky ReLU which instead of clipping the negative values to zero, cuts them to a specific amount based on a hyperparameter alpha. This ensures that the network learns something even when the pixel value is below zero. 3.5 Load the data Once the architecture has been defined, we load the training and validation data. https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 6/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV As shown below, Tensorflow allows us to easily load the MNIST data. The training and testing data loaded is stored in variables train_X and test_X respectively. Since its an unsupervised task we do not care about the labels. 1 2 3 4 from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) train_X = mnist.train.images test_X = mnist.test.images 3.6 Data Analysis Before training a neural network, it is always a good idea to do a sanity check on the data. Let’s see how the data looks like. The data consists of handwritten numbers ranging from 0 to 9, along with their ground truth labels. It has 55,000 train samples and 10,000 test samples. Each sample is a 28×28 grayscale image. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 print('Training data shape' :train_X.shape) print('Testing data shape' :test_X.shape) nsample = 1 rand_train_idx = np.random.randint(mnist.train.images.shape[0], size=nsample) for i in rand_train_idx: curr_img = np.reshape(mnist.train.images[i, :], (28,28)) curr_lbl = np.argmax(mnist.train.labels[i, :]) plt.matshow(curr_img, cmap=plt.get_cmap('gray')) plt.title(""+str(i)+"th Training Image "+ "(label: " + str(curr_lbl) + ")") plt.show() rand_test_idx = np.random.randint(mnist.test.images.shape[0], size=nsample) for i in rand_test_idx: curr_img = np.reshape(mnist.test.images[i, :], (28,28)) curr_lbl = np.argmax(mnist.test.labels[i, :]) plt.matshow(curr_img, cmap=plt.get_cmap('gray')) plt.title(""+str(i)+"th Test Image "+ "(label: " + str(curr_lbl) + ")") plt.show() Output: 1 2 (Training data shape : (55000, 784)) (Testing data shape : (10000, 784)) https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 7/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV 3.7 Preprocessing the data The images are grayscale and the pixel values range from 0 to 255. We apply following preprocessing to the data before feeding it to the network. 1. Convert each 784-dimensional vector into a matrix of size 28 x 28 x 1 which is fed into the network. 1 2 3 4 5 batch_train_x = mnist.train.next_batch(batch_size) batch_test_x = mnist.test.next_batch(batch_size) imgs_train= batch_train_x[0].reshape((-1, 28, 28, 1)) imgs_test = batch_test_x[0].reshape((-1, 28, 28, 1)) 1. Add noise to both train and test images which we then feed into the network. Noise factor is a hyperparamter and can be tuned accordingly. 1 2 3 4 5 noise_factor = 0.5 x_train_noisy = imgs_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_train.shape) x_test_noisy = imgs_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=imgs_test.shape) x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) 3.8 Training the model The network is ready to get trained. We specify the number of epochs as 25 with batch size of 64. This means that the whole dataset will be fed to the network 25 times. We will be using the test data for validation. 1 batch_cost, _ = sess.run([cost, opt], feed_dict={inputs_: x_train_noisy,targets_: imgs,learning_rate:lr}) 3.9 Evaluate the model We check the performance of the model on our test set by checking the cost (loss). 1 batch_cost_test = sess.run(cost, feed_dict={inputs_: x_test_noisy,targets_: imgs_test}) Output 1 ('Epoch: 25/25...', 'Training loss: 0.1196', 'Validation loss: 0.1171') After 25 epochs we can see our training loss and validation loss is quite low which means our network did a pretty good job. Let’s now see the loss plot between training and validation data. https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 8/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV 3.10 Training Vs. Validation Loss Plot 1 2 3 4 5 6 7 8 9 10 loss.append(batch_cost) valid_loss.append(batch_cost_test) plt.plot(range(e+1), loss, 'bo', label='Training loss') plt.plot(range(e+1), valid_loss, 'r', label='Validation loss') plt.title('Training and validation loss') plt.xlabel('Epochs ',fontsize=16) plt.ylabel('Loss',fontsize=16) plt.legend() plt.figure() plt.show() From the above loss plot, we can observe that the validation loss and training loss are both steadily decreasing in the first ten epochs. This training loss and the validation loss are also very close to each other. This means that our model has generalized well to unseen test data. We can further validate our results by observing the original, noisy and reconstruction of test images. 3.11 Results (/wp-content/uploads/2017/11/original-mnist-2.png) Figure: Original Images https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 9/10 10/27/2018 Understanding Autoencoders using Tensorflow (Python) | Learn OpenCV (/wp-content/uploads/2017/11/noisy-mnist-1.png) Figure: Images with Noise (/wp-content/uploads/2017/11/reconstruction-mnist-1.png) Figure: Reconstruction of Noisy Images From the above figures, we can observe that our model did a good job in denoising the noisy images that we had fed into our model. Subscribe & Download Code If you liked this article and would like to download code (iPython notebook), please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/understanding-autoencoders-using-tensorflow-python/ 10/10 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV Learn OpenCV Keras Tutorial : Using pre-trained Imagenet models DECEMBER 26, 2017 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) (https://www.learnopencv.com/wp-content/uploads/2017/12/keras-classification-results-gif.gif) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutional- https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 1/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV neural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trainedmodels/) 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In this post we will learn how to use pre-trained models trained on large datasets like ILSVRC, and also learn how to use them for a different task than it was trained on. We will be covering the following topics in the next three posts : 1. Image classification using different pre-trained models ( this post ) 2. Training a classifier for a different task, using the features extracted using the above-mentioned models – This is also referred to Transfer Learning. 3. Training a classifier for a different task, by modifying the weights of the above models – This is called Fine-tuning. What is ImageNet ImageNet (http://www.image-net.org/) is a project which aims to provide a large image database for research purposes. It contains more than 14 million images which belong to more than 20,000 classes ( or synsets ). They also provide bounding box annotations for around 1 million images, which can be used in Object Localization tasks. It should be noted that they only provide urls of images and you need to download those images. What is ILSVRC ImageNet Large Scale Visual Recognition Challenge ( ILSVRC (http://imagenet.org/challenges/LSVRC/2017/index) ) is an annual competition organized by the ImageNet team since 2010, where research teams evaluate their computer vision algorithms various visual recognition tasks such as Object Classification and Object Localization. The training data is a subset of ImageNet with 1.2 million images belonging to 1000 classes. Deep Learning came to limelight in 2012 when Alex Krizhevsky and his team won the competition by a margin of a whooping 11%. ILSVRC and Imagenet are sometimes used interchangeably. Why use pre-trained models? Allow me a little digression. https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 2/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV Imagine two people, Mr. Couch Potato and Mr. Athlete. They sign up for soccer training at the same time. Neither of them has ever played soccer and the skills like dribbling, passing, kicking etc. are new to both of them. Mr. Couch Potato does not move much, and Mr. Athlete does. That is the core difference between the two even before the training has even started. As you can imagine, the skills Mr. Athlete has developed as an athlete (e.g. stamina, speed and even sporting instincts ) are going to be very useful for learning soccer even though Mr. Athlete has never trained for soccer. Mr. Athlete benefits from his pre-training. The same holds true for using pre-trained models in Neural Networks. A pre-trained model is trained on a different task than the task at hand but provides a very useful starting point because the features learned while training on the old task are useful for the new task. We have seen earlier that we can create and train small convolutional networks ( CNNs ) to classify digits ( using MNIST ) or different objects ( using CIFAR10 ). These small networks fall short when there are many classes and the objects vary in size / shape / appearance etc, as the model lacks the complexity which is required to model such large variations in data. Even though it is possible to model any function using just a single hidden layer theoretically, but the number of neurons required to do so would be very large, making the network difficult to train. Thus, we use deep networks with many hidden layers which try to learn different features at different layers as we saw in the previous post on CNNs. Deep networks have a large number of unknown parameters ( in millions ). The task of training a network is to find the optimum parameters using the training data. From linear algebra, we know that in order to solve an equation with three unknown parameters, we need three equations ( data ). And, if we know only two equations, we can get exact values of maximum 2 parameters and only an approximate value for the 3rd unknown parameter. Similarly, for finding all the unknown parameters accurately, we would need a lot of data ( in millions ). If we have very few data, we will get only approximate values for most of the parameters, which we don’t want. Moral of the story is For Deep Networks – More data -> Better learning. https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 3/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV The problem is that it is difficult to get such huge labeled datasets for training the network. Another problem, related to deep networks is that even if you get the data, it takes a large amount of time to train the network ( hundreds of hours ). Thus, it takes a lot of time, money and effort to train a deep network successfully. Fortunately, we can leverage the models already trained on very large amounts of data for difficult tasks with thousands of classes. Many Research groups share the models they have trained for competitions like ILSVRC. The models have been trained on millions of images and for hundreds of hours on powerful GPUs. Most often we use these models as a starting point for our training process, instead of training our own model from scratch. Enough of background, let’s see how to use pre-trained models for image classification in Keras. Download CodeTo easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) Pre-trained models present in Keras The winners of ILSVRC have been very generous in releasing their models to the open-source community. There are many models such as AlexNet, VGGNet, Inception, ResNet, Xception and many more which we can choose from, for our own task. Apart from the ILSVRC winners, many research groups also share their models which they have trained for similar tasks, e.g, MobileNet, SqueezeNet etc. These networks are trained for classifying images into one of 1000 categories or classes. Keras comes bundled with many models. A trained model has two parts – Model Architecture and Model Weights. The weights are large files and thus they are not bundled with Keras. However, the weights file is automatically downloaded ( one-time ) if you specify that you want to load the weights trained on ImageNet data. It has the following models ( as of Keras version 2.1.2 ): VGG16, InceptionV3, ResNet, MobileNet, https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 4/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV Xception, InceptionResNetV2 Loading a Model in Keras We can load the models in Keras using the following code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import keras import numpy as np from keras.applications import vgg16, inception_v3, resnet50, mobilenet #Load the VGG model vgg_model = vgg16.VGG16(weights='imagenet') #Load the Inception_V3 model inception_model = inception_v3.InceptionV3(weights='imagenet') #Load the ResNet50 model resnet_model = resnet50.ResNet50(weights='imagenet') #Load the MobileNet model mobilenet_model = mobilenet.MobileNet(weights='imagenet') In the above code, we first import the python module containing the respective models. Then we load the model architecture and the imagenet weights for the networks. If you don’t want to initialize the network with imagenet weights, replace ‘imagenet’ with None. Loading and pre-processing an image We can load the image using any library such as OpenCV, PIL, skimage etc. Keras also provides an image module which provides functions to import images and perform some basic pre-processing required before feeding it to the network for prediction. We will use the keras functions for loading and pre-processing the image. Specificallly, we perform the following steps on an input image: 1. Load the image. This is done using the load_img() function. Keras uses the PIL format for loading images. Thus, the image is in width x height x channels format. 2. Convert the image from PIL format to Numpy format ( height x width x channels ) using image_to_array() function. 3. The networks accept a 4-dimensional Tensor as an input of the form ( batchsize, height, width, channels). This is done using the expand_dims() function in Numpy. 1 2 3 4 5 6 7 from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from keras.applications.imagenet_utils import decode_predictions import matplotlib.pyplot as plt %matplotlib inline filename = 'images/cat.jpg' https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 5/13 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV filename = images/cat.jpg # load an image in PIL format original = load_img(filename, target_size=(224, 224)) print('PIL image size',original.size) plt.imshow(original) plt.show() # convert the PIL image to a numpy array # IN PIL - image is in (width, height, channel) # In Numpy - image is in (height, width, channel) numpy_image = img_to_array(original) plt.imshow(np.uint8(numpy_image)) plt.show() print('numpy array size',numpy_image.shape) # Convert the image / images into batch format # expand_dims will add an extra dimension to the data at a particular axis # We want the input matrix to the network to be of the form (batchsize, height, width, channels) # Thus we add the extra dimension to the axis 0. image_batch = np.expand_dims(numpy_image, axis=0) print('image batch size', image_batch.shape) plt.imshow(np.uint8(image_batch[0])) Output (‘PIL image size’, (224, 224)) (‘numpy array size’, (224, 224, 3)) (‘image batch size’, (1, 224, 224, 3)) (https://www.learnopencv.com/wp-content/uploads/2017/12/cat.jpg) Predicting the Object Class Once we have the image in the right format, we can feed it to the network and get the predictions. The image we got in the previous step should be normalized by subtracting the mean of the ImageNet data. This is because the network was trained on the images after this pre-processing. We follow the following steps to https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 6/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV get the classification results. 1. Preprocess the input by subtracting the mean value from each channel of the images in the batch. Mean is an array of three elements obtained by the average of R, G, B pixels of all images obtained from ImageNet. The values for Imagenet are : [ 103.939, 116.779, 123.68 ]. This is done using the preprocess_input() function. 2. Get the classification result, which is a Tensor of dimension ( batchsize x 1000 ). This is done by model.predict() function. 3. Convert the result to human-readable labels – The vector obtained above has too many values to make any sense. Keras provides a function decode_predictions() which takes the classification results, sorts it according to the confidence of prediction and gets the class name ( instead of a class-number ). We can also specify how many results we want, using the top argument in the function. The output shows the class ID, class name and the confidence of prediction. 1 2 3 4 5 6 7 8 9 10 11 # prepare the image for the VGG model processed_image = vgg16.preprocess_input(image_batch.copy()) # get the predicted probabilities for each class predictions = vgg_model.predict(processed_image) # print predictions # convert the probabilities to class labels # We will get top 5 predictions which is the default label = decode_predictions(predictions) print label Output [[(u’n02123597′, u’Siamese_cat’, 0.30934173), (u’n01877812′, u’wallaby’, 0.080341272), (u’n02326432′, u’hare’, 0.075098492), (u’n02325366′, u’wood_rabbit’, 0.050530687), (u’n03223299′, u’doormat’, 0.048173629)]] Comparison of Results from various Models Let us see what the different models say for a few images. Giving a cat image as input, and running it on the 4 models, we get the following output. (https://www.learnopencv.com/wp-content/uploads/2017/12/cat_output.jpg) Giving an Dog as an input, this is the output. https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 7/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 8/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV (https://www.learnopencv.com/wp-content/uploads/2017/12/dog_output.jpg) With an orange, https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 9/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV (https://www.learnopencv.com/wp-content/uploads/2017/12/orange_output.jpg) For a tomato, we get https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 10/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV (https://www.learnopencv.com/wp-content/uploads/2017/12/tomato_output.jpg) For a watermelon we get https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 11/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV (https://www.learnopencv.com/wp-content/uploads/2017/12/watermelon_output.jpg) Well, it looks like the ILSVRC does not recognize tomatoes and watermelons. We will see how to train a classifier using these same models with our own data to recognize any other set of objects which are not present in the ILSVRC dataset. This would be the topic of our next two posts. Stay tuned! Subscribe & Download Code If you liked this article and would like to download code and example images used in this post, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 12/13 10/27/2018 Keras Tutorial : Using pre-trained ImageNet models | Learn OpenCV COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/ 13/13 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV Learn OpenCV Keras Tutorial : Transfer Learning using pre-trained models JANUARY 3, 2018 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras 11. Fine-tuning pre-trained models in Keras (/keras-tutorial-fine-tuning-using-pre-trained-models) 12. More to come . . . In our previous tutorial (/keras-tutorial-using-pre-trained-imagenet-models/), we learned how to use models which were trained for Image Classification on the ILSVRC data. In this tutorial, we will discuss how to use those models as a Feature Extractor and train a new model for a different classification task. Suppose you want to make a household robot which can cook food. The first step would be to identify different vegetables. We will try to build a model which identifies Tomato, Watermelon, and Pumpkin for this tutorial. In the previous tutorial, we saw the pre-trained models were not able to identify them because these categories were not learned by the models. Transfer Learning vs Fine-tuning The pre-trained models are trained on very large scale image classification problems. The convolutional https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 1/8 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV layers act as feature extractor and the fully connected layers act as Classifiers. (/wp-content/uploads/2017/11/cnn-schema1.jpg) Since these models are very large and have seen a huge number of images, they tend to learn very good, discriminative features. We can either use the convolutional layers merely as a feature extractor or we can tweak the already trained convolutional layers to suit our problem at hand. The former approach is known as Transfer Learning and the latter as Fine-tuning. As a rule of thumb, when we have a small training set and our problem is similar to the task for which the pre-trained models were trained, we can use transfer learning. If we have enough data, we can try and tweak the convolutional layers so that they learn more robust features relevant to our problem. You can get a detailed overview of Fine-tuning and transfer learning here (http://cs231n.github.io/transfer-learning/). We will discuss Transfer Learning in Keras in this post. ImageNet Jargon https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 2/8 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV (/wp-content/uploads/2018/01/imagenet-tomato.png) ImageNet is based upon WordNet which groups words into sets of synonyms (synsets). Each synset is assigned a “wnid” ( Wordnet ID ). Note that in a general category, there can be many subcategories and each of them will belong to a different synset. For example Working Dog ( sysnet = n02103406 ), Guide Dog ( sysnet = n02109150 ), and Police Dog ( synset = n02106854 ) are three different synsets. The wnid’s of the 3 object classes we are considering are given below n07734017 -> Tomato n07735510 -> Pumpkin n07756951 -> WaterMelon Download and prepare Data For downloading Imagenet images by wnid, there is a nice code repository written by Tzuta Lin which is https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 3/8 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV available on Github (https://github.com/tzutalin/ImageNet_Utils). You can use this to download images of a specific “wnid”. You can visit the github page and follow the instructions to download the images for any of the wnid’s. However, If you are just starting out and do not want to download full size images, you can use another python library available through pip – imagenetscraper (https://pypi.python.org/pypi/imagenetscraper). It is easy to use and also provides resizing options. Installation and usage instructions are provided below. Note that it works with python3 only. Download CodeTo easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) 1 2 3 4 5 6 7 # Install imagenetscraper pip3 install imagenetscraper # Download the images for imagenetscraper n07756951 imagenetscraper n07734017 imagenetscraper n07735510 the three wnids and keep them in separate folders. watermelon tomato pumpkin I found that the data is very noisy, i.e. there is a lot of clutter, the objects are occluded etc. So, I shortlisted around 250 images for each class. We need to create two directories namely “train” and “validation” so that we can use the Keras functions for loading images in batches. Load the pre-trained model 1 2 3 4 5 from keras.applications import VGG16 vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3)) In the above code, we load the VGG Model along with the ImageNet weights similar to our previous tutorial. There is, however, one change – include_top=False. We have not loaded the last two fully connected layers which act as the classifier. We are just loading the convolutional layers. It should be noted that the last layer has a shape of 7 x 7 x 512. Extract Features The data is divided into 80:20 ratio and kept in separate train and validation folders. Each folder should contain 3 folders belonging to he respective classes. You can change the directory according to your system. 1 train dir ' /clean dataset/train' https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 4/8 10/27/2018 1 2 3 4 5 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV train_dir = ./clean-dataset/train validation_dir = './clean-dataset/validation' nTrain = 600 nVal = 150 We will use the ImageDataGenerator class to load the images and flow_from_directory function to generate batches of images and labels. 1 2 3 4 5 6 7 8 9 10 11 12 datagen = ImageDataGenerator(rescale=1./255) batch_size = 20 train_features = np.zeros(shape=(nTrain, 7, 7, 512)) train_labels = np.zeros(shape=(nTrain,3)) train_generator = datagen.flow_from_directory( train_dir, target_size=(224, 224), batch_size=batch_size, class_mode='categorical', shuffle=shuffle) Then we use model.predict() function to pass the image through the network which gives us a 7 x 7 x 512 dimensional Tensor. We reshape the Tensor into a vector. Similarly, we find the validation_features. 1 2 3 4 5 6 7 8 9 10 i = 0 for inputs_batch, labels_batch in train_generator: features_batch = vgg_conv.predict(inputs_batch) train_features[i * batch_size : (i + 1) * batch_size] = features_batch train_labels[i * batch_size : (i + 1) * batch_size] = labels_batch i += 1 if i * batch_size >= nImages: break train_features = np.reshape(train_features, (nTrain, 7 * 7 * 512)) Create your own model We will create a simple feedforward network with a softmax output layer having 3 classes. 1 2 3 4 5 6 7 8 from keras import models from keras import layers from keras import optimizers model = models.Sequential() model.add(layers.Dense(256, activation='relu', input_dim=7 * 7 * 512)) model.add(layers.Dropout(0.5)) model.add(layers.Dense(3, activation='softmax')) Train the model Training a network in Keras is as simple as calling model.fit() function as we have seen in our earlier tutorials. 1 2 3 4 5 6 7 model.compile(optimizer=optimizers.RMSprop(lr=2e-4), loss='categorical_crossentropy', metrics=['acc']) history = model.fit(train_features, train_labels, epochs=20, https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 5/8 10/27/2018 7 8 9 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV epochs=20, batch_size=batch_size, validation_data=(validation_features,validation_labels)) Check Performance We would like to visualize which images were wrongly classified. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 fnames = validation_generator.filenames ground_truth = validation_generator.classes label2index = validation_generator.class_indices # Getting the mapping from class index to class label idx2label = dict((v,k) for k,v in label2index.iteritems()) predictions = model.predict_classes(validation_features) prob = model.predict(validation_features) errors = np.where(predictions != ground_truth)[0] print("No of errors = {}/{}".format(len(errors),nVal)) Let us see which images were predicted wrongly 1 2 3 4 5 6 7 8 9 10 11 12 for i in range(len(errors)): pred_class = np.argmax(prob[errors[i]]) pred_label = idx2label[pred_class] print('Original label:{}, Prediction :{}, confidence : {:.3f}'.format( fnames[errors[i]].split('/')[0], pred_label, prob[errors[i]][pred_class])) original = load_img('{}/{}'.format(validation_dir,fnames[errors[i]])) plt.imshow(original) plt.show() https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 6/8 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV (/wp-content/uploads/2018/01/result-transfer-learning-image.jpg) We will try to improve on the limitations of transfer learning by using another approach called Fine-tuning in our next post. References https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html Deep Learning with Python Github Repository (https://github.com/fchollet/deep-learning-with-pythonnotebooks) Subscribe & Download Code If you liked this article and would like to download code and example images used in this post, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 7/8 10/27/2018 Keras Tutorial : Transfer Learning using pre-trained models | Learn OpenCV COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/keras-tutorial-transfer-learning-using-pre-trained-models/ 8/8 10/27/2018 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV Learn OpenCV Keras Tutorial : Fine-tuning using pre-trained models FEBRUARY 6, 2018 BY VIKAS GUPTA (HTTPS://WWW.LEARNOPENCV.COM/AUTHOR/VIKAS/) (/wp-content/uploads/2018/01/keras-ft-result.jpg) This post is part of the series on Deep Learning for Beginners, which consists of the following tutorials : 1. Neural Networks : A 30,000 Feet View for Beginners (/neural-networks-a-30000-feet-view-forbeginners/) 2. Installation of Deep Learning frameworks (Tensorflow and Keras with CUDA support ) (/installing-deeplearning-frameworks-on-ubuntu-with-cuda-support/) 3. Introduction to Keras (/deep-learning-using-keras-the-basics/) 4. Understanding Feedforward Neural Networks (/understanding-feedforward-neural-networks/) 5. Image Classification using Feedforward Neural Networks (/image-classification-using-feedforwardneural-network-in-keras/) 6. Image Recognition using Convolutional Neural Network (/image-classification-using-convolutionalneural-networks-in-keras/) 7. Understanding Activation Functions (/understanding-activation-functions-in-deep-learning/) 8. Understanding AutoEncoders using Tensorflow (/understanding-autoencoders-using-tensorflow-python/) 9. Image Classification using pre-trained models in Keras (/keras-tutorial-using-pre-trained-imagenetmodels/) 10. Transfer Learning using pre-trained models in Keras (/keras-tutorial-transfer-learning-using-pre-trainedmodels/) 11. Fine-tuning pre-trained models in Keras 12. More to come . . . In the previous two posts, we learned how to use pre-trained models and how to extract features from them for training a model for a different task. In this tutorial, we will learn how to fine-tune a pre-trained model for a different task than it was originally trained for. https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 1/9 10/27/2018 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV We will try to improve on the problem of classifying pumpkin, watermelon, and tomato discussed in the previous post (/keras-tutorial-transfer-learning-using-pre-trained-models/). We will be using the same data for this tutorial. What is Fine-tuning of a network We have already explained (/keras-tutorial-using-pre-trained-imagenet-models/#why-pretrained-models) the importance of using pre-trained networks in our previous article. Just to recap, when we train a network from scratch, we encounter the following two limitations : Huge data required – Since the network has millions of parameters, to get an optimal set of parameters, we need to have a lot of data. Huge computing power required – Even if we have a lot of data, training generally requires multiple iterations and it takes a toll on the computing resources. The task of fine-tuning a network is to tweak the parameters of an already trained network so that it adapts to the new task at hand. As explained here (/image-classification-using-convolutional-neural-networks-inkeras/#cnn-hierachical), the initial layers learn very general features and as we go higher up the network, the layers tend to learn patterns more specific to the task it is being trained on. Thus, for fine-tuning, we want to keep the initial layers intact ( or freeze them ) and retrain the later layers for our task. Thus, fine-tuning avoids both the limitations discussed above. The amount of data required for training is not much because of two reasons. First, we are not training the entire network. Second, the part that is being trained is not trained from scratch. Since the parameters that need to be updated is less, the amount of time needed will also be less. Fine-tuning in Keras Let us directly dive into the code without much ado. We will be using the same data which we used in the previous post. You can choose to use a larger dataset if you have a GPU as the training will take much longer if you do it on a CPU for a large dataset. We will use the VGG model for fine-tuning. Download Code To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE! https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 2/9 10/27/2018 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV DOWNLOAD CODE (HTTPS://BIGVISIONLLC.LEADPAGES.NET/LEADBOX/143948B73F72A2%3A173C9390C346DC/5649050225344512/) Load the pre-trained model First, we will load a VGG model without the top layer ( which consists of fully connected layers ). 1 2 3 from keras.applications import VGG16 #Load the VGG model vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3)) Freeze the required layers In Keras, each layer has a parameter called “trainable”. For freezing the weights of a particular layer, we should set this parameter to False, indicating that this layer should not be trained. That’s it! We go over each layer and select which layers we want to train. 1 2 3 4 5 6 7 # Freeze the layers except the last 4 layers for layer in vgg_conv.layers[:-4]: layer.trainable = False # Check the trainable status of the individual layers for layer in vgg_conv.layers: print(layer, layer.trainable) (/wp-content/uploads/2018/01/keras-ft-trainable-layers.png) Create a new model Now that we have set the trainable parameters of our base network, we would like to add a classifier on top of the convolutional base. We will simply add a fully connected layer followed by a softmax layer with 3 outputs. This is done as given below. https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 3/9 10/27/2018 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV from keras import models from keras import layers from keras import optimizers # Create the model model = models.Sequential() # Add the vgg convolutional base model model.add(vgg_conv) # Add new layers model.add(layers.Flatten()) model.add(layers.Dense(1024, activation='relu')) model.add(layers.Dropout(0.5)) model.add(layers.Dense(3, activation='softmax')) # Show a summary of the model. Check the number of trainable parameters model.summary() (https://www.learnopencv.com/wp-content/uploads/2018/01/keras-ft-model-summary.png) Setup the data generators We have already separated the data into train and validation and kept it in the “train” and “validation” folders. We can use ImageDataGenerator available in Keras to read images in batches directly from these folders and optionally perform data augmentation. We will use two different data generators for train and validation folders. 1 2 3 4 5 6 7 train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, fill mode='nearest') https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 4/9 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV fill_mode= nearest ) validation_datagen = ImageDataGenerator(rescale=1./255) # Change the batchsize according to your system RAM train_batchsize = 100 val_batchsize = 10 train_generator = train_datagen.flow_from_directory( train_dir, target_size=(image_size, image_size), batch_size=train_batchsize, class_mode='categorical') validation_generator = validation_datagen.flow_from_directory( validation_dir, target_size=(image_size, image_size), batch_size=val_batchsize, class_mode='categorical', shuffle=False) Train the model Till now, we have created the model and set up the data for training. So, we should proceed with the training and check out the performance. We will have to specify the optimizer and the learning rate and start training using the model.fit() function. After the training is over, we will save the model. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Compile the model model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop(lr=1e-4), metrics=['acc']) # Train the model history = model.fit_generator( train_generator, steps_per_epoch=train_generator.samples/train_generator.batch_size , epochs=30, validation_data=validation_generator, validation_steps=validation_generator.samples/validation_generator.batch_size, verbose=1) # Save the model model.save('small_last4.h5') Check Performance We obtained an accuracy of 90% with the transfer learning approach discussed in our previous blog. Here we are getting a much better accuracy of 98%. Let us see the loss and accuracy curves. 1 2 3 4 5 6 7 acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs = range(len(acc)) https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 5/9 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV plt.plot(epochs, acc, 'b', label='Training acc') plt.plot(epochs, val_acc, 'r', label='Validation acc') plt.title('Training and validation accuracy') plt.legend() plt.figure() plt.plot(epochs, loss, 'b', label='Training loss') plt.plot(epochs, val_loss, 'r', label='Validation loss') plt.title('Training and validation loss') plt.legend() plt.show() (/wp-content/uploads/2018/01/keras-ft-accuracy-curve.png) (/wp-content/uploads/2018/01/keras-ft-loss-curve.png) Also, let us visually see the errors that we got. 1 2 3 4 5 6 7 # Create a generator for prediction validation_generator = validation_datagen.flow_from_directory( validation_dir, target_size=(image_size, image_size), batch_size=val_batchsize, class_mode='categorical', shuffle=False) https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 6/9 10/27/2018 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV shuffle=False) # Get the filenames from the generator fnames = validation_generator.filenames # Get the ground truth from generator ground_truth = validation_generator.classes # Get the label to class mapping from the generator label2index = validation_generator.class_indices # Getting the mapping from class index to class label idx2label = dict((v,k) for k,v in label2index.items()) # Get the predictions from the model using the generator predictions = model.predict_generator(validation_generator, steps=validation_generator.samples/validation_generator.bat predicted_classes = np.argmax(predictions,axis=1) errors = np.where(predicted_classes != ground_truth)[0] print("No of errors = {}/{}".format(len(errors),validation_generator.samples)) # Show the errors for i in range(len(errors)): pred_class = np.argmax(predictions[errors[i]]) pred_label = idx2label[pred_class] title = 'Original label:{}, Prediction :{}, confidence : {:.3f}'.format( fnames[errors[i]].split('/')[0], pred_label, predictions[errors[i]][pred_class]) original = load_img('{}/{}'.format(validation_dir,fnames[errors[i]])) plt.figure(figsize=[7,7]) plt.axis('off') plt.title(title) plt.imshow(original) plt.show() (/wp-content/uploads/2018/01/keras-ft-error1.png) https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 7/9 10/27/2018 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV (/wp-content/uploads/2018/01/keras-ft-error2.png) (/wp-content/uploads/2018/01/keras-ft-error3.png) Experiments We have done 3 experiments to see the effect of fine-tuning and data augmentation. We kept the validation set same as the previous post i.e. 50 images per class. 1. Freezing all layers and learning a classifier on top of it – similar to transfer learning. The number of errors was 15 out of 150 images which is similar to what we got in the previous post. 2. Training the last 3 convolutional layers – We got 9 errors out of 150. https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 8/9 10/27/2018 Keras Tutorial : Fine-tuning pre-trained models | Learn OpenCV 3. Training the last 3 convolutional layers with data augmentation – The number of errors reduced to 3 out of 150. I hope you find this useful. Try doing your own experiments and post your findings in the comments section. References Keras Blog (https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) Deep Learning with Python Github Repository (https://github.com/fchollet/deep-learning-with-pythonnotebooks) Subscribe & Download Code If you liked this article and would like to download code and example images used in this post, please subscribe (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) to our newsletter. You will also receive a free Computer Vision Resource (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news. Subscribe Now (https://bigvisionllc.leadpages.net/leadbox/143948b73f72a2%3A173c9390c346dc/5649050225344512/) COPYRIGHT © 2018 · BIG VISION LLC https://www.learnopencv.com/keras-tutorial-fine-tuning-using-pre-trained-models/ 9/9
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.6 Linearized : Yes Create Date : 2018:10:28 01:56:33+05:30 Creator : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36 Modify Date : 2018:10:28 01:56:33+05:30 Title : XMP Toolkit : Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30 Metadata Date : 2018:10:28 01:56:33+05:30 Creator Tool : Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36 Format : application/pdf Document ID : uuid:712e0fac-2600-4b54-9132-1f3d37fb8e4e Instance ID : uuid:8ef54a1d-73a6-4379-a105-4d0609dea0fe Producer : Skia/PDF m69 Page Count : 111EXIF Metadata provided by EXIF.tools