Processing of genomic data and SNP calling for rice
accessions
Demultiplexing of raw GBS data, mapping and SNP calling were implemented
in a pipeline using Toggle v0.3.3
(Monat et al., 2015).
Reads were demultiplexed with PROCESSRADTAGS and mapped to the
IRGSP-1.0 Nipponbare reference genome (Kawahara et al. 2013) using BWA
(Li & Durbin, 2009)
with option –n 5 for sub-commands aln and SAMSE.
The alignments were sorted with
picardToolsSortSam and SamtoolsView
(http://broadinstitute.github.io/picard/ , Li 2011). The GATK
suite (McKenna et al. 2017) was used for downstream treatments. We used
Realignertargetcreator to define suitable intervals for local
realignments and Indelrealigner to perform local realignment of
reads around indels. Markduplicates was used to remove
duplicates, available in Picardtools. The output bam files were
divided into per chromosome bam files with Bamtools. SNP
calling was made with GATK for each chromosome with
Gatkhaplotypecaller, while filtering sites with the option
Badcigar. High-confidence SNPs were identified using GATK’s
Variantfiltration to filter variants based on parameters
DP>10, QUAL > 30.
Genomic data from the 216 worldwide rice accessions were mapped against
the IRGSP-1.0 Nipponbare reference genome using the same procedure.
Mapping data were post-processed as described above. Analyses were
conducted on the intersect between the set of SNPs identified with GBS
data for YYT landraces and the one identified with whole genome data for
worldwide accessions.