Sample Preparation
Our sample data originated from a Pinus contorta linkage mapping
population consisting of a single parent and its 106 F1 offspring from
the interior of British Columbia, Canada, and was provided to us by
collaborators at the University of British Columbia
(https://coadaptree.forestry.ubc.ca). For each F1 sample, haploid
megagametophyte tissue was excised from embryonic seeds for sequencing.
Sample preparation, probe design, DNA extraction, and library
preparation were performed as described in Lind et al. 2021. DNA samples
were sequenced at the Genome Quebec Innovation Centre at McGill
University, Montreal, Canada, where they isolated 351 Gbp of
~150-bp paired-end reads from an Illumina HiSeq4000
instrument.
We used fastp v0.19.5 (Chen et al. 2018) to process and trim sample
reads and BWA-MEM v0.7.17 (Li & Durbin 2009) to align them against the
congeneric loblolly pine (Pinus taeda ) reference genome v2.01
(Zimin et al. 2017; https://treegenesdb.org/FTP/Genomes/Pita/v2.01), as
a reference genome does not yet exist for P. contorta . We then
sorted, indexed, and converted the aligned reads to BAM files with
SAMtools v1.9 and processed them with PICARD v2.18.9
(http://broadinstitute.github.io/picard). Where applicable, we converted
the BAM files to mpileup files using SAMtools.
We then took these common BAM (or mpileup) files and called SNPs using
the following variant caller programs: FreeBayes, HaplotypeCaller,
SAMtools, UnifiedGenotyper, and VarScan. After calling SNPs we performed
an initial baseline level of filtering consisting of filtering criteria
specific to each caller followed by a common set of filtering criteria
(Table 1). The common filtering thresholds we used for each caller
required that sites were called in both the parent and the F1 sample,
did not have greater than 50% missingness, and were not multiallelic.
We used both VCFtools v0.1.14 (Danecek et al. 2011) and R v1.4.1106 (R
Core Team 2021) to achieve the above-described filtering.