Image-based amphibian recognition

The canton of Vaud has built several tunnels under the road in the town of Gimel, to allow animals to cross safely. In order to justify the costs of this project, cameras have been installed to count the number of animal crossings. These cameras are programmed to take a series of pictures when movement is detected, during the day or at night.

Goals

Originally, the counting of newts, frogs or toads on the images captured in the tunnels was done manually. This task is time consuming and it is easy to miss an animal. The main goal of this project is therefore to automate this task using the already labeled images and a machine learning model.

The tasks to be performed can be broken down as follows:

Image selection
Image processing
Choice of features
Choice of a pre-trained model and its parameters
Evaluation of the quality of the results

Work done

This section explains the main Machine Learning techniques used to solve this problem.

Data augmentation

The image database consists of just under one million images captured automatically in various tunnels. However, this large number of images is not really beneficial to us since only ~2000 images are labeled. In addition, there are many false positives caused by the movement of dead leaves or by visits from employees in charge of the installation. This is why we applied the principle of data augmentation to generate variants of our images in order to enlarge our database.

Object recognition

Each image can potentially present several scenarios: one frog, several frogs, a frog and a newt, nothing at all, etc. Thus, it is not enough to predict one label per image, but it is also necessary to identify where the elements are located within the image.

Architecture of a convolutional neural network with a SSD detector

We ended up using Single-Shot Detector (SSD) to perform this task. SSD uses a pre-trained model such as ResNet or ImageNet to extract features from the images. It then adds its own convolutional layers on top of the model to obtain the delimitations of the objects with their classes and confidence scores.

Transfer learning

We used pre-trained models to reduce the training time of our models. These models were pre-trained on the COCO 2017 dataset which is composed of several hundred thousand images with manually delimited objects. Since frogs and newts are not part of the labeled features, it is necessary to teach these models to recognize them before moving on to the prediction phase.

Performance

The following models were all trained with the same settings and with 640x640 pixel images. Of course, the results are slightly variable from one run to another since the separation of training and test data as well as the generation of image variants are done randomly.

Model	TensorFlow mAP	mAp
SSD ResNet50	0.343	0.25
SSD ResNet152	0.354	0.20
SSD MobileNet V1	0.291	0.11

Performance was measured using the Mean Average Precision metric. The mAP incorporates the trade-off between precision and recall and considers both false positives (FP) and false negatives (FN). This property makes mAP a suitable metric for most detection applications.

Conclusion

In conclusion, our models were able to identify some of the frogs and newts, but they are still far from perfect. Here are several ways to improve the results:

Increasing the amount of labeled data
Using other models, like YOLOv3 or R-CNN

This work demonstrates that it is possible to use Machine Learning to replace time-consuming and tedious work if it has to be done by hand. Of course, the possibilities offered by Machine Learning are infinite and even if a model is excellent at a given time, it is quite possible that it will be surpassed in a few years.

Footnotes

I worked on this project for the "Machine Learning on Big Data" course given at the HES-SO in Lausanne. It was made in collaboration with Dylan Mamié and Jérôme Vial.