Association testing
Genotyping-in-thousands by sequencing (GT-seq, Campbell et al.2015) was employed to genotype 308 genetic markers for the association
testing analyses. The GT-seq 308 loci were a subset of markers developed
from the paired end consensus reads from the Hess et al. (2013)
RAD-seq dataset. The selection of loci and steps in development are
described in detail in Supplemental Materials. Locus selection began
with a group of 457 total SNP loci considered in round 1, which included
120 that had been already designed for TaqMan assays (Hess et al.2015). Final optimization left 308 loci that worked best in GT-seq
genotyping. For all samples used below in the association testing we
filtered out individuals missing >10% of genotypes at the
308 loci. Excluding the four species diagnostic loci and two duplicated
loci provided 302 unique loci for association tests.
There were six samples, five comprised of adults (JDD, S_BON, T_BON,
WFA♀, and WFA♂) and one comprised of larvae (GAR), with which we
performed association testing (Table S1). Adult samples were from the
following three locations: males (WFA♂, N=136) and females (WFA♀, N=133)
from Willamette Falls collected in 2016 (Willamette River, Oregon City,
OR; 205.6 Rkm upstream from the Columbia River mouth), two samples
(S_BON, N=295 and T_BON, N=883) from Bonneville Dam in 2014 (235.1 Rkm
upstream from the Columbia River mouth), and one sample (JDD, N=656)
from John Day Dam in 2014 and 2015 (346.9 Rkm upstream from the Columbia
River mouth). The following five adult traits were measured on all adult
samples: ordinal “day” of collection (timing of migration to the
sample point), girth (mm), total “length” (mm), weight (g), and
distance between dorsal fins (“interdorsal”, mm). Interdorsal
measurements have been suggested to serve as an indicator of maturation
status in Pacific lamprey because the distance tends to decrease with
maturation (Clemens et al. 2009). We measured an additional
migration trait for three adult samples (S_BON, T_BON, and JDD) via a
combination of passive integrated transponder (PIT) and radio tagging of
individual fish and observing their furthest upstream detection from the
release location (“Rkm”). Further, since the males and females
collected at Willamette Falls (WFA♂ and WFA♀) were being harvested, we
were able to measure gonad weight as a proxy for maturity in those
samples. Finally, a subset of the adult sample from Bonneville Dam
(S_BON) was used in a swim trial experiment within a flume (Kirket al. 2016), in which the following three swimming behavioral
traits were measured: “approached” experiment, passed challenge
(“pass”), and passed challenge without fallback (“passrep”). Details
of these swimming performance experiments can be found in Kirk et al.
(2016) and Supplemental Materials.
A single group of larvae were artificially propagated using adults
captured at Bonneville Dam. These larvae were reared in a common garden
experiment to generate early larval growth (“GAR”) rate data (N=337).
All larvae were spawned in the spring of 2015 and allowed to rear from
30 to 163 days after hatching. Growth rate was measured as length / time
(“growth”), and also corrected growth rate [“growth rate_b”;
(length – 4 mm) / time] to correct for length at hatch
(~4 mm).
Intercorrelation among all measured traits in these six samples (i.e.
JDD, S_BON, T_BON, WFA♀, WFA♂, and GAR) was examined (based on
Pearson’s r ) to avoid excessive redundancy of predictor variables
(│r │ > 0.95), and P -values were calculated
(SAS Institute, Inc. 2000). We performed univariate analyses using a
general linear model (GLM) and a mixed linear model (MLM) with TASSEL v.
5.1.0 (Bradbury et al. 2007). The GLM is a fixed effects linear
model that is used in TASSEL to identify significant associations
between phenotypes and genotypes. TASSEL takes population structure into
account by using genetic principal coordinate axes as covariates in the
model. The MLM is similar to GLM but includes both fixed effects (e.g.
population structure, and genetic marker) and random effects (i.e.,
relationships among individuals) and can thus account for both
population structure and kinship to reduce false positive associations
(Yu et al. 2006). Details on the covariates and ways in which
loci were used taking population structure and relatedness into account
in the GLM and MLM tests are provided in the Supplemental Materials. To
account for multiple tests, only those associations with P -values
less than the critical value as determined using the false discovery
rate procedure described by Benjamini and Hochberg (1995) were
considered significant. The Benjamini and Hochberg (1995) false
discovery rate approach has more power to detect significant differences
than sequential Bonferroni correction (Narum 2006). Critical values were
calculated using the function p.adjust within the R package stats (RDC
Team 2019).