Escalating concern regarding the impacts of reduced genetic diversity on the conservation of endangered species has spurred efforts to obtain chromosome-level genomes through consortia such as the Vertebrate Genomes Project. However, assembling reference genomes for many threatened species remains challenging due to difficulties obtaining optimal input samples (e.g., fresh tissue, cell lines) that can characterize long-term conservation collections. Here, we present a pipeline that leverages genome synteny to construct high-quality genomes for species of conservation concern despite less-than-optimal samples and/or sequencing data, demonstrating its use on Hector’s and Māui dolphins. These endemic New Zealand dolphins are threatened by human activities due to their coastal habitat and small population sizes. Hector’s dolphins are classified as endangered by the IUCN, while the Māui dolphin is among the most critically endangered marine mammals. To assemble reference genomes for these dolphins, we created a pipeline combining de novo assembly tools with reference-guided techniques, utilizing chromosome-level genomes of closely related species. The pipeline assembled highly contiguous chromosome-level genomes (scaffold N50: 110 MB, scaffold L50: 9, miniBUSCO completeness scores >96.35%), despite non-optimal input tissue samples. We demonstrate that these genomes can provide insights relevant for conservation, including historical demography revealing long-term small population sizes, with subspecies divergence occurring ~20 kya, potentially linked to the Last Glacial Maximum. Māui dolphin heterozygosity was 40% lower than Hector’s and comparable to other cetacean species noted for reduced genetic diversity. Through these exemplar genomes, we demonstrate that our pipeline can provide high-quality genomic resources to facilitate ongoing conservation genomics research.

Ann McCartney

and 7 more

We used long read sequencing data generated from Knightia excelsaI R.Br, a nectar producing Proteaceae tree endemic to Aotearoa New Zealand, to explore how sequencing data type, volume and workflows can impact final assembly accuracy and chromosome construction. Establishing a high-quality genome for this species has specific cultural importance to Māori, the indigenous people, as well as commercial importance to honey producers in Aotearoa New Zealand. Assemblies were produced by five long read assemblers using data subsampled based on read lengths, two polishing strategies, and two Hi-C mapping methods. Our results from subsampling the data by read length showed that each assembler tested performed differently depending on the coverage and the read length of the data. Assemblies that used longer read lengths (>30 kb) and lower coverage were the most contiguous, kmer and gene complete. The final genome assembly was constructed into pseudo-chromosomes using all available data assembled with FLYE, polished using Racon/Medaka/Pilon combined, scaffolded using SALSA2 and AllHiC, curated using Juicebox, and validated by synteny with Macadamia. We highlighted the importance of developing assembly workflows based on the volume and type of sequencing data and establishing a set of robust quality metrics for generating high quality assemblies. Scaffolding analyses highlighted that problems found in the initial assemblies could not be resolved accurately by utilizing Hi-C data and that scaffolded assemblies were more accurate when the underlying contig assembly was of higher accuracy. These findings provide insight into what is required for future high-quality de-novo assemblies of non-model organisms.