Association testing
Genotyping-in-thousands by sequencing panel optimization:
Genotyping-in-thousands by sequencing (GT-seq, Campbell et al. 2015) was
employed to genotype 308 genetic markers for the association testing
analyses. The GT-seq 308 loci were a subset of markers developed from
the paired end consensus reads from the Hess et al. (2013) RAD-seq
dataset. The selection of loci and steps in development began with a
group of 457 total SNP loci considered in round 1, which included 120
that had been already designed for TaqMan assays (Hess et al. 2015). We
chose 337 SNPs that had not been designed previously, and we ensured
that all SNP sites were located at base pair position 30 or higher to
accommodate the assay primer site in flanking DNA. We established the
following set of guidelines for choosing SNPs: 1) Pass QC filters for
Rangewide dataset, 2) only align to 1 locus in Bowtie to itself test, 3)
Overlapped with loci in the linkage map (Smith et al. 2018), 4) High
concordance in alignments to the sea lamprey genome across overlapping
markers in the Hess et al. (2013) and Smith et al. (2018) datasets, 5)
Previously developed as Tagman 96 assays + some species ID loci, 6)
Spaced 5cM or greater apart on a linkage group, 7) Mostly neutral and
high MAF for parentage power. 8) Adaptive SNPs chosen to be equally
representative across four groups of statistically linked loci. A PERL
script was run to screen out loci that appeared to have too many
heterozygotes and were likely duplicated regions. There were 401 loci
that passed this filter. Although we already had 120 primers designed
from previous work, we had to construct consensus sequence for the rest
using paired-end sequence data from Hess et al. (2013) and were
successful developing 266 primer pairs for the loci. A PERL script was
used to identify 28 primer interactions which were resolved by dropping
26 primer pairs. This filter resulted in a remaining set of 360 loci
(240 new + 120 original primer pairs). Final optimization left 308
markers that worked best in GT-seq genotyping. For all samples used in
the association testing we filtered out individuals missing
>10% of genotypes at the 308 loci. Excluding the four
species diagnostic loci and two loci that were duplicates, provided 302
unique loci for association tests.