Data driven trait quantification across a maize diversity panel using
hyperspectral leaf reflectance
Abstract
Scoring plant phenotypes across large populations in multiple
environments is a necessary precondition to both using natural genetic
diversity to build genotype to phenotype models, study genotype by
environment interactions and to carry out plant breeding to develop high
yielding and more resilient cultivars. Here we explore data driven
approaches using latent representations of leaf reflectance data
collected from a large field experiment consisting of a subset of
diverse maize lines drawn from the Wisconsin diversity panel (Mazaheri
et al., 2019). In this experiment, 2 replicates of 752 inbred lines from
the Wisconsin diversity panel were grown in field conditions. An ASD
spectrometer was used to collect data on intensity of light reflected by
leaves at 1 nanometer wide intervals between350 to 2,500 nm, resulting
in a total of 2,151 reflectance intensity values measured for each plot.
Two dimensional reduction approaches were evaluated for this dataset:
conventional principal component analysis and an auto-encoder based
neural network. Ten principal components were sufficient to summarize
99% of variance in the dataset. An autoencoder neural network
comprising of an encoder having three dense layers and a decoder having
four dense layers was able summarize variation within the dataset at a
validation loss of 0.006 using 10 latent variables. A number of
principal components and latent variables were correlated with several
phenotypes quantified for a subset of the same field grown research
plots (Figure 2A;2C). Chlorophyll, the major photosynthetic pigment in
plant leaves, plays a substantial role in determining the overall
pattern of reflectance for maize leaves. The abundance of chlorophyll
was significantly correlated with PC2 (R2 = 0.31) (Figure 2B) which
explained 11% of the total variance in higher spectral reflectance
data. However, autoencoder based summary of the same trait dataset
appears to have more accurately captured variation in chlorophyll
abundance within this field trial with LV8 exhibiting a R2 = 0.59
(Figure 2D) with ground truth chlorophyll measurements. Both PCA and
autoencoder based dimensional reduction captures a mix of variables
which were heritable (i.e. a large proportion of total variance was
attributable to differences between genotypes) and variables that were
not heritable. Two of ten PCs evaluated exhibited H2 values
>0.5 as did four of ten latent variables generated (Figure
3A; 3B). Genome wide association studies (GWAS) conducted using high
heritability principal components and latent variables identified
significant signals in 2 out of 6 cases (Figure 4A; 4B). Ongoing work is
needed to evaluate the potential of using candidate genes underlying
GWAS peaks to assign putative biological roles to latent variables
estimated from raw sensor data by autoencoders or other dimensional
reduction approaches.