Abstract
Previous crop yield improvements have been largely due to the
implementation of new management strategies, mechanization, and
application of emerging technologies. While these approaches have led to
stable, linear improvements, increases in crop yields are currently
plateauing. The use and improvement of rapid, automated, and accurate
phenomic selection methods leveraging high-resolution data collected
throughout a growing season could help identify stress-adaptive traits
to meet the growing global food demand. As the capacity of phenomics to
generate larger and higher dimensional data sets improves, there is an
urgent need to develop and implement robust and scalable data processing
pipelines for rapid turnaround of processed results. Current phenomics
processing pipelines lack modularity and the ability to exploit the
distributed computational infrastructure required for machine learning
(ML)-based workloads. To address these challenges, we developed
PhytoOracle (PO), a suite of modular, scalable pipelines that aim to
improve data processing efficiency for plant science research. PO
integrates open-source frameworks for distributed task management on
local, cloud, or high-performance computing (HPC) systems. Each pipeline
component is available as a standalone container which can be
independently deployed or linked into a pipeline. Additionally,
researchers can swap between available containers or integrate new ones
suited to their specific research. PO extracts phenotype trait values
such as volume, height, canopy temperature, and maximum quantum
efficiency (F v /F m) of photosystem II from data captured in field
settings, enabling the study of phenotypic variation for elucidation of
the genetic components of quantitative traits.