Application 2: Barcoded individuals in population samples of steelhead
Hatcheries have an important but controversial role in supplementing dwindling fish stocks in the Columbia River basin (Busby, Wainwright, & Bryant, 1999), including, in a few cases, selection for particular traits in hatchery stocks that differ from the stocks into which they are outplanted or stray (disperse to non-natal areas). One of the most abundant and widely outplanted hatchery stocks of steelhead trout in the Columbia Basin comes from Skamania Hatchery (Washougal, WA). The Skamania stock has a long history of deliberate selection for earlier spawning and larger fish (Ayerst, 1976), which has resulted in the evolution of fish that migrate notably earlier than conspecifics and almost exclusively after two or more years ocean duration (Hess et al., 2021). Without choosing individuals with known phenotypes, but rather undirectedly sampling individuals from the Skamania hatchery stock as well as individuals from two nearby natural origin stocks (Lewis River and Eagle Creek-Willamette River) in the same steelhead lineage (Coastal), we tested if genomic regions previously associated with these traits or others would appear strongly differentiated in the Skamania stock.
Library preparation followed the individual barcoding protocol from Horn et al. (2020) and sequencing was done separately for each population on the Illumina NextSeq 550 with 150-bp paired-end reads. The number of individuals per pool ranged from 60 to 78. Data were processed with PoolParty2, including discarding of reads if trimmed below 50bp from sliding windows with a minimum mean PHRED quality of 20, and filtering SNPs if they were below a PHRED quality of 20, three or fewer bases from an insertion-deletion position, observed in fewer than 10 reads in each sample pool or more than 1,500 globally, if the number of individuals surveyed per population was fewer than three of if the global minor allele frequency was less than 0.005. The allele frequency data were normalized in PPalign to mediate non-uniform read contribution among individuals. Using the PPstats module, we assessed data coverage distributions, proportion of the genome covered at specified depths, and evenness of coverage across chromosomes. Normalized allele frequencies were filtered and analyzed with PPanalyze including calculation of FST, sliding window FST (100Kbp windows in 5Kbp steps), and Fisher’s Exact test (FET). Significance values from the Exact tests were used in local score analyses, using three replicate runs with ξ representing the 80th, 90th, 95th, and 99th quantiles of significance values (the 70th quantile did not produce a mean local score distribution below zero). Filtered read alignment files (BAMs) created by PPalign were used as input for angsd, which was directed to consider the variants filtered by PPanalyze, and from which we utilized the genotype likelihoods provided by angsd as input to estimate linkage with ngsLD for three chromosomes with the most significant and consistent outlier regions in the Local Score results, considering only sites ≤ 100Kbp from one another. As above, we calculated mean LD in 100Kbp windows in 5Kbp steps in R, but identified outlier regions as contiguous series of ≥20 windows exceeding 2x the interquartile range (2xIQR) for mean windowed LD. When multiple contiguous outlier window series were present in the range identified by the lowest Local Score quantile, we report all those series.