Association testing
Genotyping-in-thousands by sequencing panel optimization:
Genotyping-in-thousands by sequencing (GT-seq, Campbell et al. 2015) was employed to genotype 308 genetic markers for the association testing analyses. The GT-seq 308 loci were a subset of markers developed from the paired end consensus reads from the Hess et al. (2013) RAD-seq dataset. The selection of loci and steps in development began with a group of 457 total SNP loci considered in round 1, which included 120 that had been already designed for TaqMan assays (Hess et al. 2015). We chose 337 SNPs that had not been designed previously, and we ensured that all SNP sites were located at base pair position 30 or higher to accommodate the assay primer site in flanking DNA. We established the following set of guidelines for choosing SNPs: 1) Pass QC filters for Rangewide dataset, 2) only align to 1 locus in Bowtie to itself test, 3) Overlapped with loci in the linkage map (Smith et al. 2018), 4) High concordance in alignments to the sea lamprey genome across overlapping markers in the Hess et al. (2013) and Smith et al. (2018) datasets, 5) Previously developed as Tagman 96 assays + some species ID loci, 6) Spaced 5cM or greater apart on a linkage group, 7) Mostly neutral and high MAF for parentage power. 8) Adaptive SNPs chosen to be equally representative across four groups of statistically linked loci. A PERL script was run to screen out loci that appeared to have too many heterozygotes and were likely duplicated regions. There were 401 loci that passed this filter. Although we already had 120 primers designed from previous work, we had to construct consensus sequence for the rest using paired-end sequence data from Hess et al. (2013) and were successful developing 266 primer pairs for the loci. A PERL script was used to identify 28 primer interactions which were resolved by dropping 26 primer pairs. This filter resulted in a remaining set of 360 loci (240 new + 120 original primer pairs). Final optimization left 308 markers that worked best in GT-seq genotyping. For all samples used in the association testing we filtered out individuals missing >10% of genotypes at the 308 loci. Excluding the four species diagnostic loci and two loci that were duplicates, provided 302 unique loci for association tests.