The successes and pitfalls: Deep learning effectiveness in a Chernobyl
field camera trap application
Abstract
Camera traps have become in-situ sensors for collecting information on
animal abundance and occupancy estimates. When deployed over a large
landscape, camera traps have become ideal for measuring the health of
ecosystems, particularly in unstable habitats where it can be dangerous
or even impossible to observe using conventional methods. However,
manual processing of imagery is extremely time and labor intensive.
Because of the associated expense, many studies have started to employ
machine learning tools, such as convolutional neural networks (CNNs).
One drawback is that for the majority of networks a large number of
images (millions) are needed to devise an effective identification or
classification model. This study examines specific factors pertinent to
camera trap placement in the field that may influence the accuracy
metrics of a deep learning model that has been trained with a small set
of images. False negatives and false positives may occur due to a
variety of environmental factors that make it difficult for even a human
observer to classify, including local weather patterns and daylight. We
transfer-trained a CNN to detect 16 different object classes (14 animal
species, humans, and fires) across 9,576 images taken from camera traps
placed in the Chernobyl Exclusion Zone. After analyzing wind speed,
cloud cover, temperature, and image contrast, there was a significant
positive association between CNN success and temperature. Furthermore,
we found that the model was more successful when images were taken
during the day as well as when precipitation was not present. We show
that external variables at camera trap locations have a noticeable
effect on CNN accuracy. Qualitative site-specific factors can confuse
quantitative classification algorithms such as CNNs. This study suggests
that further exploration into the causes of error in classification
modeling is necessary given the unique challenges posed by the analysis
of camera trap imagery.