Divergence mapping
Two new Pacific lamprey genome assemblies were constructed using the whole genome sequence from the milt and blood from a male (representing the gametic and somatic genomes) and the blood of a female, and using a high density linkage map (Smith et al. 2018) to validate and extend higher order scaffolding of chromosomes. High molecular weight DNA was extracted from these tissues by Amplicon Express (Pullman, WA, USA), and 10X sequencing was performed on an Illumina Nova-seq (University of Illinois Urbana-Champaign). NT-10X Genomics linked-reads from male milt and blood were first deduplicated with hts_SuperDeduper tool, that is part of HTStream pipeline (https://ibest.github.io/HTStream/), and combined together providing 54X effective read coverage and estimated mean molecule size of 57Kb. De novo assembly was performed by Supernova assembler v2.1 (Weisenfeld et al . 2017) and then ALLMAPS (Tanget al . 2015) was used for further scaffolding based on linkage map, placing 63% of assembled sequence to the 83 linkage groups. These steps resulted in the assembly of the Pacific lamprey male genome of 974 Mb in size with a scaffold N50 of 7.8 Mb and longest scaffold reaching 21Mb.
Linked-reads sequenced from female blood had longer mean molecule size (87Kb) and effective coverage of 42X. They were also assembled with Supernova v2.1 and then 69% of the assembled sequence was placed to the linkage groups by running ALLMAPS. These steps generated an assembly of the Pacific lamprey female genome that is 997 Mb in size with longest scaffold of 22 Mb and a scaffold N50 of 10 Mb.
For characterization of SNP densities and F STstatistics, we used a set of 7,716 unique SNP loci from previously published RAD-seq datasets (Hess et al. 2013; Smith et al.2018), which passed the following a set of population genetic QC filters. The 518 individuals distributed among 21 samples and across the species’ range (described in Table 1, Hess et al. 2013) had no more than 20% missing genotypes, and SNP loci had >1% minor allele frequency averaged across the subset of 16 samples with N > 20; and SNP loci had <3 Hardy-Weinberg deviations within 5 aggregated samples (following methods to minimize potential Wahlund effects by pooling individuals into the following five test populations as described in Hess et al. (2013)). This set of 7,716 SNPs was a combination of a group of SNPs from a previous dataset (Hesset al. 2013; SNPs N = 8,772 of which 6,295 passed these population genetic QC filters) and a group of SNPs discovered de novo for a linkage mapping dataset (Smith et al. 2018; SNPs N = 7,977 of which 3,670 passed these population genetic QC filters. BOWTIE2 (Langmead and Salzberg 2012) was used to align datasets of 8,772 (Hesset al. 2013) and 7,977 SNPs (Smith et al. 2018) to the male reference assembly to define homologous loci. For the 7,716 total SNPs passing the QC filters, 4,046 loci were unique to Hess et al. 2013, 1,418 loci were unique to Smith et al. 2018, and 2,252 SNPs were shared across datasets. Marker positions based on BOWTIE2 alignments were compared between Pacific lamprey male and female genomes and the Pacific lamprey male and sea lamprey male gametic genome (GenBank assembly accession: GCA_002833325.1) to characterize synteny.
The program minimap2.1 with parameters (–cs=long -cx asm20) was used for alignment between the Pacific lamprey male and Sea lamprey genomes. The function maf-convert (from LAST (Kiełbasa et al. 2011) was used to generate a chain file, that was used by CrossMap (Zhao et al. 2013) to lift over gene annotations from Sea lamprey to the Pacific lamprey male genome assembly.