Building a Geological Cyber-infrastructure: Automatically detecting
Clasts in Photomicrographs
Abstract
To incentivize the participation and contribution to the growth of an
earth-science-based cyberinfrastructure, analytical environments need to
be developed that allow automatic analysis and classification of data
from connected data repositories. The purpose of this study is to
investigate a machine learning technique for automatically detecting
shear-sense-indicating clasts (i.e., sigma or delta clasts and mica
fish) in photomicrographs, and finding their shear sense (i.e.,
sinistral (CCW) or dextral (CW) shearing). Previous work employed
transfer learning, a technique in which a pre-trained Convolutional
Neural Network (CNN) was repurposed, and artificially augmented image
datasets to distinguish between CCW and CW shearing. Preprocessing
images by denoising, a process in which noise at different scales is
removed while preserving edges of an image, improved classification
accuracy. However, upon randomizing the denoising parameters, the CNN
model didn’t converge due to severe lack of data. While the efforts for
acquiring more labeled data is ongoing, this work compensated for it by
implementing a pre-processing “detection” system that automatically
crops images to regions of image containing the clasts. This is done by
utilizing YOLOv3, a CNN based image detection system that outputs a
bounding box around an object of interest. YOLOv3 was trained using 93
photomicrographs containing bounding boxes of 344 shear-sense-indicating
clasts. The retrained detector was tested on two sets: set A with 10
photomicrographs containing clasts and set B with 100 photomicrographs
not containing clasts. All but one of the clasts in set A were correctly
detected with an average confidence score of 96.6%. On set B, 72% of
images correctly did not indicate presence of clasts. On the remaining
images, where clasts were incorrectly identified, an average confidence
score of 78.3% was observed. By utilizing a threshold on the confidence
scores, the system could be made more accurate. Future work involves
utilizing the bounding boxes output by the detection system to refine
and improve the CNN model for classifying shear sense of clasts in
photomicrographs.