Divergence mapping
Two new Pacific lamprey genome assemblies were constructed using the
whole genome sequence from the milt and blood from a male (representing
the gametic and somatic genomes) and the blood of a female, and using a
high density linkage map (Smith et al. 2018) to validate and extend
higher order scaffolding of chromosomes. High molecular weight DNA was
extracted from these tissues by Amplicon Express (Pullman, WA, USA), and
10X sequencing was performed on an Illumina Nova-seq (University of
Illinois Urbana-Champaign). NT-10X Genomics linked-reads from male milt
and blood were first deduplicated with hts_SuperDeduper tool, that is
part of HTStream pipeline (https://ibest.github.io/HTStream/), and
combined together providing 54X effective read coverage and estimated
mean molecule size of 57Kb. De novo assembly was performed by Supernova
assembler v2.1 (Weisenfeld et al . 2017) and then ALLMAPS (Tanget al . 2015) was used for further scaffolding based on linkage
map, placing 63% of assembled sequence to the 83 linkage groups. These
steps resulted in the assembly of the Pacific lamprey male genome of 974
Mb in size with a scaffold N50 of 7.8 Mb and longest scaffold reaching
21Mb.
Linked-reads sequenced from female blood had longer mean molecule size
(87Kb) and effective coverage of 42X. They were also assembled with
Supernova v2.1 and then 69% of the assembled sequence was placed to the
linkage groups by running ALLMAPS. These steps generated an assembly of
the Pacific lamprey female genome that is 997 Mb in size with longest
scaffold of 22 Mb and a scaffold N50 of 10 Mb.
For characterization of SNP densities and F STstatistics, we used a set of 7,716 unique SNP loci from previously
published RAD-seq datasets (Hess et al. 2013; Smith et al.2018), which passed the following a set of population genetic QC
filters. The 518 individuals distributed among 21 samples and across the
species’ range (described in Table 1, Hess et al. 2013) had no
more than 20% missing genotypes, and SNP loci had >1%
minor allele frequency averaged across the subset of 16 samples with N
> 20; and SNP loci had <3 Hardy-Weinberg
deviations within 5 aggregated samples (following methods to minimize
potential Wahlund effects by pooling individuals into the following five
test populations as described in Hess et al. (2013)). This set of 7,716
SNPs was a combination of a group of SNPs from a previous dataset (Hesset al. 2013; SNPs N = 8,772 of which 6,295 passed these
population genetic QC filters) and a group of SNPs discovered de
novo for a linkage mapping dataset (Smith et al. 2018; SNPs N =
7,977 of which 3,670 passed these population genetic QC filters. BOWTIE2
(Langmead and Salzberg 2012) was used to align datasets of 8,772 (Hesset al. 2013) and 7,977 SNPs (Smith et al. 2018) to the
male reference assembly to define homologous loci. For the 7,716 total
SNPs passing the QC filters, 4,046 loci were unique to Hess et
al. 2013, 1,418 loci were unique to Smith et al. 2018, and 2,252
SNPs were shared across datasets. Marker positions based on BOWTIE2
alignments were compared between Pacific lamprey male and female genomes
and the Pacific lamprey male and sea lamprey male gametic genome
(GenBank assembly accession: GCA_002833325.1) to characterize synteny.
The program minimap2.1 with parameters (–cs=long -cx asm20) was used
for alignment between the Pacific lamprey male and Sea lamprey genomes.
The function maf-convert (from LAST (Kiełbasa et al. 2011) was used to
generate a chain file, that was used by CrossMap (Zhao et al. 2013) to
lift over gene annotations from Sea lamprey to the Pacific lamprey male
genome assembly.