¿Is easy to detect if an image has a dog or human ? if it is a dog ¿which is its breed?

Deep learning help us to answer!!!!

Damaris Hernandez
15 min readFeb 4, 2021

Project Definition

Project Overview

This post is about my Udacity Data Science Nanodegree capstone project. The project that I chose was the use Convolutional Neural Networks to Identify Dog Breeds.

CNN is a Deep Learning algorithm that takes images as input, determines the importance of features (aspect or objects) to be able to differentiate an image of other and its principal class.

Problem Statement

The principal problem to solve is to classify with the best performance images and determine if the image has a human or dog, also, which is the breed of dog. This task could be not easy if the number of images was large and similarities between breeds.

How to solve this problem, Deep learning is the best alternative to help us to classify images and detect when an image has a human or dog. Also, this can help to identify which is the breed dog if there is a dog in the image, this task could be not easy when the breeds are very similar.

Deep learning techniques are much easier to use in the last years. There are techniques like transferring learning that allow to improve the performance of models with the help of a pre-trained CNN model trained. The idea of this technique is simple, there are models that have to be trained with a large dataset, this model can be used in similar images to classify images.

Use these techniques is suitable because they allow us to obtain good performance with limitations that we have. With these we can build an algorithm to know if an image has a human or dog, this algorithm can be composed of three parts:

  1. Human Face Detector.
  2. Dog Detector.
  3. The dog Breed classifier.

Metrics

To evaluate the performance, the principal metrics used were classification Accuracy and F Score. The classification Accuracy is the ratio of the number of correct predictions to the total number of input samples. This metric was used to evaluate simple tasks of classification, but this is not suitable when there are not the equal number of samples belonging to each class.

100*count (predictions == test_targets)/(total predictions)

To avoid the misclassification, especially in the dog Breed classifier, The F-Score was used, this is often used in deep learning. This metric quantifies the area beneath PR Curve or Precision-Recall, that shows the relationship between precision and recall, it is a two-dimensional graph with precision metrics displayed in the y-axis and recall’s in the x-axis. It is most often used when learning from imbalanced data.

F= 2*precision*recall/precision + recall

Analysis

Data Exploration and Data Visualization

The dataset used in this project has 8,351 total images with 133 different breeds. This was divided in three dataset, Train, validation and test.

# define function to load train, test, and validation datasetsdef load_dataset(path):data = load_files(path)dog_files = np.array(data['filenames'])dog_targets = np_utils.to_categorical(np.array(data['target']), 133)return dog_files, dog_targets# load train, test, and validation datasetstrain_files, train_targets = load_dataset('../../../data/dog_images/train')valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')test_files, test_targets = load_dataset('../../../data/dog_images/test')# load list of dog namesdog_names = [item[20:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]# print statistics about the datasetprint('There are %d total dog categories.' % len(dog_names))print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files, test_files])))print('There are %d training dog images.' % len(train_files))print('There are %d validation dog images.' % len(valid_files))print('There are %d test dog images.'% len(test_files))

The 133 breeds in the dataset are imbalanced. There are breeds with more representation than other like Border collie or Basset hound.

An aspect that is relevant in classify of images is the shape of them. Actually, to improve the train task it is necessary to resize the images to a square image according to the architecture of the network, in this case is 224×224 pixels. This resize of the images might degrade their quality or lead to noisy labels. It is more shocking to upscaling than downscaling images because this affects the accuracy of the model. There are 46 images in the train dataset that need them to upscaling.

shapes of images hxw

Methodology

Data Preprocessing

The images must be preprocessing to use in the models. Among these tasks there are transforming the image to gray or resize, this is important to improve the performance of models in aspects like time and compute resources.

# TO GRAY def to_gray(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
return gray
# RESIZEfrom keras.preprocessing import image
from tqdm import tqdm
def path_to_tensor(img_path):

# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):

list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors

Implementation

This project divides in three principal parts because the solution to build must detect whether an image has a human or a dog, and if there is a dog whose breed is it. Each part has a process different, and these were joined in an only algorithm.

Human Face Detector.

This detector uses OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, this implementation has been trained with several images with faces (positive) and without faces (negative), and used a detectMultiScale to get the coordinates of all the faces then returns them as a list of rectangles. The function built returns True if the length of that list is greater than zero.

face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')def face_detector(img_path):'''This function detect if a human face is a in an imageParameters:img_path: path of imageReturnsif the amount of faces is major to zero (True or False)'''img = cv2.imread(img_path)gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray)return len(faces) > 0

This model was evaluated, and its performance was good, but it can be improved. Of the 100 images of dogs classified, 11 images were misclassified.

human_files_short = human_files[:100]
dog_files_short = train_files[:100]
# Do NOT modify the code above this line.
## TODO: Test the performance of the face_detector algorithm
## on the images in human_files_short and dog_files_short.
#Counters of faces
counter_in_human = 0
counter_in_dog = 0
# For each face deteted the loops sum one in the each counter.
for image in human_files_short:
if face_detector(image) is True:
counter_in_human +=1

for image in dog_files_short:
if face_detector(image) is True:
counter_in_dog+=1

print("Percentage of the first 100 images in human_files have a detected human face :{}%.".format(counter_in_human))
print("Percentage of the first 100 images in dog_files have a detected human face: {}%.".format(counter_in_dog))

Dog Detector

This detector uses OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, this implementation has been trained with several images with faces (positive) and without faces (negative), and used a detectMultiScale to get the coordinates of all the faces then returns them as a list of rectangles. The function built returns True if the length of that list is greater than zero.

from keras.applications.resnet50 import ResNet50# define ResNet50 modelResNet50_model = ResNet50(weights='imagenet')from keras.preprocessing import imagefrom tqdm import tqdmdef path_to_tensor(img_path):# loads RGB image as PIL.Image.Image typeimg = image.load_img(img_path, target_size=(224, 224))# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)x = image.img_to_array(img)# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensorreturn np.expand_dims(x, axis=0)def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):# returns prediction vector for image located at img_pathimg = preprocess_input(path_to_tensor(img_path))return np.argmax(ResNet50_model.predict(img))### returns "True" if a dog is detected in the image stored at img_pathdef dog_detector(img_path):'''
This function detect if a dog is in an image
Parameters:
img_path: path of image
Returns
if the returned valor is between 151 and 268 (inclusive). (True or False)
'''
prediction = ResNet50_predict_labels(img_path)return ((prediction <= 268) & (prediction >= 151))

This model was evaluated, and its performance was good , of the 100 images of faces human classified, 0 images were misclassified.

### TODO: Test the performance of the dog_detector function
### on the images in human_files_short and dog_files_short.
#Counters of faces
counter_dogs_in_human = 0
counter_dogs_in_dog = 0
# For each face deteted the loops sum one in the each counter.
for img in human_files_short:
if dog_detector(img) == True:
counter_dogs_in_human +=1

for img in dog_files_short:
if dog_detector(img) == True:
counter_dogs_in_dog +=1

print("Percentage of the first 100 images in human_files have a detected dog :{}%.".format(counter_dogs_in_human))
print("Percentage of the first 100 images in dog_files have a detected dog: {}%.".format(counter_dogs_in_dog))

Dog Breeds classifier

To identify the breed, CNN was built. This is a type of deep learning algorithm that is uses of classify tasks. Each layer of the network is specialized in extracting information of several objects, for example, the first layers detect lines or curves. The images to train the model were normalized to be used. To do that, each pixel was divided by 255.

from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True
# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

The neuronal network CNN was composed of 4 layers. The input was an image preprocessed to have a size of an input shape of 224 by 224 matrix in channel RGB (224,224,3):

First layer: Its purpose is to identify low-level features such as edges in the image. This was composed of a two-dimensional convolution layer with kernel size of a 2 to obtain the same number of parameters and an activation function relu. Also, it had the two-dimensional MaxPooling layer is used to reduce the spatial dimensions of the output volume from (224,224,16) to (112,112,16).

Second layer: This layer was similar to the first layer. In this layer changed the number of output filters for 32.

Third layer: This layer was similar to the first layer. In this layer changed the number of output filters for 64 and the output shape is (28,28,64).

Fourth layer: This layer was composed of a two-dimensional GlobalAveragePooling layer, which calculates the global average of the image and reduces the size to one times the number of output filters (64) and Dense layer contains 133 nodes and a softmax function as activation to obtain a probability for each dog breeds.

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential
model = Sequential()model.add(Conv2D(filters=16, kernel_size=2, padding="same", activation="relu", input_shape=(224,224,3)))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=32, kernel_size=2 , padding='same' , activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(filters=64, kernel_size=2, padding="same", activation="relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(GlobalAveragePooling2D())
model.add(Dense(133, activation="softmax"))
model.summary()# Compile the Model
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

The model was trained and tested. This performance was bad, 5.9% of accuracy and 0.06 of F-score. This type of algorithm needs a large train dataset and compute resources, these were limited.

from keras.callbacks import ModelCheckpoint### TODO: specify the number of epochs that you would like to use to train the model.epochs = 30### Do NOT modify the code below this line.checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', 
verbose=1, save_best_only=True)
model.fit(train_tensors, train_targets,
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)
model.load_weights('saved_models/weights.best.from_scratch.hdf5')# get index of predicted dog breed for each image in test set
dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]
# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)
# f-score
from sklearn.metrics import f1_score
score = f1_score(np.argmax(test_targets, axis=1), np.array(dog_breed_predictions),average='micro')
print('F-Measure: %.3f' % score)

Refinement

Human Face Detector.

Other option to detect face is HoG Face Detector in Dlib, This is a widely used face detection model, based on HoG features and SVM. With the deployment of this model of the 100 images of dogs classified, 6 images were misclassified. This model had performance better than Haar Cascades.

# import the necessary packages
import numpy as np
import argparse
import time
import cv2
# Get Face Detector from dlib
face_detector_d = dlib.get_frontal_face_detector()
def face_detector_dlib(img_path):

'''
This function detect if a human face is a in an image

Parameters:
img_path: path of image

Returns
if the amount of faces is major to zero (True or False)

'''
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_detector_d(gray)
return len(faces) > 0
human_files_short = human_files[:100]
dog_files_short = train_files[:100]
# Do NOT modify the code above this line.
## TODO: Test the performance of the face_detector algorithm
## on the images in human_files_short and dog_files_short.
#Counters of faces
counter_in_human_r = 0
counter_in_dog_r = 0
# For each face deteted the loops sum one in the each counter.
for image in human_files_short:
if face_detector_dlib(image) is True:
counter_in_human_r +=1

for image in dog_files_short:
if face_detector_dlib(image) is True:
counter_in_dog_r+=1

print("Percentage of the first 100 images in human_files have a detected human face :{}%.".format(counter_in_human_r))
print("Percentage of the first 100 images in dog_files have a detected human face: {}%.".format(counter_in_dog_r))

Dog Breeds classifier

This part of the algorithm was improving with the technique of transferring learning. In this part, 4 available networks in Keras were used and selected the best. The networks used were VGG-19, ResNet-50, Inception, and Xception. This technique allows to use less compute and time resources. Also, it helps to improve models where the datasets are small.

### TODO: Obtain bottleneck features from another pre-trained CNN.
def Obtain_bottleneck_features(bottleneck):

'''
This create train,valid and test dataframes for a pre-trained model.

Parameters:
bottleneck: pre-trained model name in keras.

Returns
train,valid and test dataframes

'''

bottleneck_features = np.load('/data/bottleneck_features/Dog{}Data.npz'.format(bottleneck))
train = bottleneck_features['train']
valid = bottleneck_features['valid']
test = bottleneck_features['test']
return train,valid,test

bottleneck=['VGG19', 'Resnet50', 'InceptionV3','Xception']
# for each pre-trained model apply te function Obtain_bottleneck_features
for b in bottleneck:
globals()["train_" + str(b)], globals()["valid_" + str(b)], globals()["test_" + str(b)]= Obtain_bottleneck_features(b)

The deployed architecture was simple because these models have been pre-trained, and this helps to use fewer parameters. This Architecture used the extraction of information that has each pre-trained model and only use a two-dimensional GlobalAveragePooling layer to calculate the global average of the image and reduce the size to one times the number of output filters and Dense layer that contains 133 nodes and a softmax function as activation to obtain a probability for each dog breeds.

# for each pre-trained model define the same architecture 
for b in bottleneck:

globals()["model_" + str(b)] = Sequential()
globals()["model_" + str(b)].add(GlobalAveragePooling2D(input_shape=(globals()["train_" + str(b)].shape[1:])))
globals()["model_" + str(b)].add(Dense(133, activation='softmax'))

print('Model architecture: {}'.format(b))
print(globals()["model_" + str(b)].summary())
# for each pre-trained model define the same compilation.
for b in bottleneck:
globals()["model_" + str(b)].compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# for each pre-trained model define the same train task.
for b in bottleneck:

globals()["checkpointer_" + str(b)] = ModelCheckpoint(filepath='saved_models/weights.best.{}.hdf5'.format(b), verbose=1 , save_best_only =True)

print('Train model: {}'.format(b))

globals()["model_" + str(b)].fit(globals()["train_" + str(b)], train_targets,
validation_data = ( globals()["valid_" + str(b)], valid_targets),
epochs=25, batch_size=20, callbacks=[globals()["checkpointer_" + str(b)]], verbose=1)
# for each pre-trained model define the same load taks of the model weights with the best validation loss.
for b in bottleneck:
globals()["model_" + str(b)].load_weights('saved_models/weights.best.{}.hdf5'.format(b))

The performance of all models was better than the previous model. Resnet50, InceptionV3 and Xception obtained accuracy and F-score more than 80% and 0.8 respectively. The Xception was select to Dog Breeds classifier.

#for each pre-trained model Calculate classification accuracy on the test dataset.
for b in bottleneck:

globals()["predictions_" + str(b)] = [np.argmax(globals()["model_" + str(b)].predict(np.expand_dims(feature, axis=0))) for feature in globals()["test_" + str(b)]]

# report test accuracy

globals()["accuracy_" + str(b)]= 100 * np.sum(np.array(globals()["predictions_" + str(b)])==np.argmax(test_targets, axis=1))/len(globals()["predictions_" + str(b)])

globals()["fscore" + str(b)]= f1_score(np.argmax(test_targets, axis=1),np.array(globals()["predictions_" + str(b)]),average='micro')

print('Test accuracy {}'.format(b))
print(globals()["accuracy_" + str(b)])

print('F score {}'.format(b))
print(globals()["fscore" + str(b)])
def Xception_predict_dog_breed (img_path):
# extract the bottle neck features
bottleneck_feature = extract_Xception(path_to_tensor(img_path))
## get a vector of predicted values
predicted_vector = model_Xception.predict(bottleneck_feature)

## return the breed
return dog_names[np.argmax(predicted_vector)]

Results

Final Algorithm

Finally, we can build a function to join of the three parts and create an algorithm to detect if an image have a human or dog, also which is the breed of dog.

def this_image_is(img_path):'''This function determines is the images is a dog, human or neither.Parameters:img_path: path of imageReturnsif the image there is a dog, human or neither.'''#Predition of breed.dog_breed = clean_breed(Xception_predict_dog_breed(img_path))#detection of dog or human face.if dog_detector(img_path) == True:print('This is a dog of breed {}'.format(dog_breed))plt.imshow(cv2.imread(img_path))plt.show()elif face_detector_dlib(img_path) == True:print('This is a human')plt.imshow(cv2.imread(img_path))plt.show()else:print('This image is neither dog nor human')plt.imshow(cv2.imread(img_path))plt.show()

Model Evaluation and Validation

Xception Model

The Xception model is robust because after the k-fold cross validation was done, the accuracy value was stable and did not fluctuate much. This indicates that the model is robust against small perturbations in the training data. The mean of metric is 85.13% (+/- 1.08%).

from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import KFold
import numpy
#define 10-fold cross validation test harnesskfold = KFold(n_splits=10, shuffle=True)cvscores = []X=train_Xception
Y=train_targets
fold_no = 1for train, test in kfold.split(X, Y):# create model
model_evaluation = Sequential()
model_evaluation.add(GlobalAveragePooling2D(input_shape=(globals()["train_" + str(b)].shape[1:])))
model_evaluation.add(Dense(133, activation='softmax'))

# Compile model
model_evaluation.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

# Generate a print
print('------------------------------------------------------------------------')
print(f'Training for fold {fold_no} ...')

# Fit the model
model_evaluation.fit(X[train], Y[train],
validation_data = (valid_Xception, valid_targets),
epochs=25, batch_size=20, callbacks=[checkpointer_Xception], verbose=1)

# evaluate the model
scores = model_evaluation.evaluate(X[test], Y[test], verbose=0)

print("%s: %.2f%%" % (model_evaluation.metrics_names[1], scores[1]*100))

cvscores.append(scores[1] * 100)

fold_no = fold_no + 1

print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))

The k-fold cross validation fluctuated between 83% and 87% accuracy value with 10 folds.

Final Algorithm

The performance of final algorithm is good, however, there are images that the algorithm classified wrong like the drawing of a human. Also there are breed misclassified.

A sample of 80 images was obtained of the test_files to test the performance of the Dog Breeds classifier. The 77.5% of images was classified correctly, and 22.5% was misclassified. The breed with more misclassified images was the English cocker spaniel, this breed is similar to boykin spaniel.

Justification

The achieved solution responses to the problem that was proposed is able to estimate whether there is a human face, a dog or neither, also if the image is a dog which is its breed. However, this solution in a real-world application could not be optimal because the F-score reached is less than other deployments of deep learning (CNN) where this metric reache values major to 0.9.

Conclusion

The deep learning is a field that gets more important each year. It is interesting that we can find flexibility to solve problems like classify images, techniques such as Transfer Learning that allows us to build and deploy models with the help of pre-trained models that help us to transfer their knowledge to a smaller dataset. This is possible because the convolutional layers extract general, low-level features that are applicable across images.

The performance of the algorithm was good. However, this could improve. The next points could be considered:

  • Provide a larger training set.
  • Used a detector, a face of a human that does not need that face is well-oriented. For this task it is recommended OpenCV-DNN and HoG methods.
  • Use more images of dogs in the train task.
  • Increase of the number of epochs and used techniques to avoid overfitting like early stopping where the training the model stops when the test accuracy has stopped improving after a few epochs.

--

--

No responses yet