Chensong Chen - ESS Open Archive

Sugarcane has a complex, highly polyploid genome with multi-species ancestry. Additive models for genomic prediction of clonal performance might not capture interactions between genes and alleles from different ploidies and ancestral species. As such genomic prediction in sugarcane presents an interesting case for machine learning methods, which are purportedly able to deal with high levels of complexity in prediction. Here we investigate deep learning networks (DL), including Multilayer networks (MLP) and convolution neural networks (CNN), and Random Forest (RF) for genomic prediction in sugarcane. The data set was 2912 sugarcane clones, scored for 26,086 genome wide SNP markers, with final assessment trial (FAT) data for total cane harvested (TCH), Commercial cane sugar (CCS) and Fibre content. The clones in the latest trial (2017) were used as a validation set. We compared performances of these methods to GBLUP extended to include dominance and epistatic effects. The prediction accuracies from GBLUPs were 0.37 for TCH, 0.37 for CCS and 0.48 for Fibre, while the DL models had accuracies of 0.33 for TCH prediction, 0.38 for CCS prediction and 0.43 for Fibre. Optimised RF achieved a prediction accuracy of 0.35 for TCH, 0.38 for CCS and 0.48 for Fibre. Both DL and RF predictions were more accurate additive GBLUP but generally lower than extended GBLUP. Finally, we identified a partially shared distribution of SNP selections between RF and GBLUP models. We conclude RF may have some utility for genomic prediction for crops with highly complex genomes, particularly if non-additive interactions can be captured with clonal propagation.