Flower Image Recognition
Although, in practice it’s most common to use pre-trained networks we are not going to talk about them here. Now, if you are interested in understanding the network architecture and how it is coded I would say this could be an interesting read for you. — I hope :)
It’s worth saying the network built here is based on the explanation and knowledge acquired from Deep Learning Specialization offered by DeepLearning.AI. — Great courses!
What flower is this?
While travelling, there were a couple of times I would see flowers or plants and wonder what is its name. At one time I thought, wouldn’t it be great if there was an application where I would just take a picture and know right away! With that memory in mind, I picked this Flower Image Dataset to make an image classification system.
This Dataset have a 4323 image collection of 5 types of flowers. Which are daisy, dandelion, rose, sunflower and tulip.
ResNet to the Rescue
Residual Networks are a very powerful model for image recognition. The introduction of ResNet allowed to train much deeper networks than were previously feasible (e.g., LeNet-5, AlexNet and VGG).
Very deep networks can represent very complex functions, but in practice they don’t work because they are hard to train due to vanishing gradients. The skip-connections (skipping over layers) helps to address this problem by reusing activations from a previous layer until the adjacent layer learns its weights.
These Residual Networks are built by stacking two main types of blocks, the identity block (standard block used in ResNet) and the convolutional block. Typical implementations skips two or three layers that contain nonlinearities (ReLU) and batch normalization in between.
- Identity block: Is the standard block used in ResNet and used when the input activation has the same dimension as the output activation.
- Convolutional block: CONV2D layer in the shortcut path and used when the input and output dimensions don’t match up.
All together, this classic ResNet-50 has the following architecture.
Now that we implemented our model we can think of training it!
The Dataset have a good distribution of images, it’s relatively balanced so that won’t be an issue. I used 90% of the images to train the model and 10% for validation, which I believe to be a good proportion.
To make the training faster I also resized all images to 64 x 64 x 3 dimension images. And to prevent overfitting, I took advantage of data augmentation and added regularization to the Conv2D and Dense layers with l2 = 1e-4.
For the learning process, I used the Adam optimizer which combines the best properties of the AdaGrad and RMSProp algorithms.
In the case we train the model on 64 epochs with a batch size of 40 its given an accuracy of 77.60%. Training the model for other values of iterations and batch size will bring other effects to the performance of the model. Doing a few experiments I verified a test accuracy that oscillates up to 82% (re-trained the model a couple of times to see how this would change).
There is always room for improvement, but for the purpose of this article, which is to understand the architecture of a network I didn’t spend more time from here.
Possible reasons behind the fluctuation over epochs,
(i) All Neural Networks have a natural stochastic behavior
(ii) Large network but small dataset: 23M+ parameters for less than 4323 images available
(iii) Batch size and learning rate
To overcome overfitting,
(i) Add more regularization and play with hyper parameters
(ii) Data augmentation
(iii) Increase more data samples naturally
(iv) Reduce hidden layers
- Fine tuning and training a ResNet model can consume a great deal of time.
- For a more practical usage, pre-trained network architectures open source implementations might be the best idea because it can achieve great results and save a lot of time.
- ResNet can be a very powerful model but when considering computational budget use transfer learning.
- Check this notebook for the complete analysis.