Small population paradigm and structural variants

Like SNPs, SVs vary within and among individuals, and SV variation is subject to genetic drift, selection, mutation and gene flow, and is impacted by demography, particularly effective population size (Ne). Species and/or populations with large Ne should harbor a higher number of SVs, whereas populations with small Ne should harbor less (e.g., fewer SV counts in the critically endangered Hawaiian crow versus non-threatened conspecifics; Weissensteiner et al., 2020). Although also influencing the mutation rates and genomic locations of SNP variants, the genome composition and architecture (e.g., overall chromosome repeat content) has a much larger impact on the number, type, location and mutation rate of SVs (Collins et al., 2020). Further, while both SNPs and SVs can be neutral, their size relative to single base substitutions mean that SVs are more likely to overlap gene and regulatory regions and impact fitness. The evolutionary dynamics of SVs in large populations will therefore be determined by the interplay between the functional effects of these SVs, if any, and variation in mutation rates of different SV sizes and types, which may lead to enrichment of SVs of particular types and/or in particular genomic regions (Conrad & Hurles, 2007).
In contrast, populations of conservation concern are likely to be fragmented or chronically small, and the evolutionary dynamics of both SNP and SV variation will be dominated by genetic drift and founder effects that interplay with the small Ne to both decrease the overall level genetic variation and reduce the efficacy of natural (purifying) selection (Henn, Botigué, Bustamante, Clark, & Gravel, 2015). Decreasing genetic variation leads to further reductions in Ne which increases the probability of extinction through both the loss of adaptive variants and the increase in frequency of deleterious variants. This can lead to a downward spiral of further loss of diversity and increasing extinction risk, a process termed an ‘extinction vortex’ (Gilpin & Soule 1986). Further, inbreeding is unavoidable in small populations, leading to higher frequencies of deleterious recessive individuals in the population, with this inbreeding depression leading to a further fitness burden on the threatened species (Kardos, Taylor, Ellegren, Luikart, & Allendorf, 2016; Stoffel, Johnston, Pilkington, & Pemberton, 2021).
While both SNPs and SVs can have deleterious fitness effects, their size and potential to overlap gene regions mean that SVs may be enriched in the pool of deleterious variants that are maintained in small populations. While SVs of very large deleterious effect may be purged from the population, small populations may selectively accumulate slightly deleterious SVs. This enrichment indicates that SVs likely underpin some negative fitness traits associated with inbreeding depression. Although there are some clues as to the evolutionary dynamics of SVs in small populations, further study is critical for resolving the significance of SVs in population persistence and species extinction risk.

Relating structural variants to fitness traits

Advancements in WGS technologies coupled with increased computational capacities, have renewed interest in the role of SVs in determining trait differences (Chiang et al., 2017; Pang et al., 2010; Sadowski et al., 2019; Yi & Ju, 2018). For example, characterizing SVs in agricultural species has led to the identification of variants associated with economically significant traits in crops like grapevine (Zhou et al., 2019), maize (Yang et al., 2019), soybean (Liu et al., 2020) and tomato (Alonge et al., 2020), and identified specific genes and gene regions associated with domestication in vertebrates (Bertolotti et al., 2020; Cagan & Blass, 2016; vonHoldt et al., 2017). The impact of these SVs on trait variation may be direct or indirect. For example, SVs can lead to direct changes to gene coding regions and gene regulation that change the function or expression of proteins (Derek M. Bickhart & Liu, 2014; Collins et al., 2020). In contrast, in some cases where a particular SV is associated with a trait, the SV may not be the causal variant, but rather increase the likelihood of de novo causal variants nearby. This is the case for a relatively common 1.3Mb inversion on the human Y-chromosome where microdeletions accrue at inversion breakpoints, which can result in profound impacts on male fertility (Hallast et al., 2021). There is also growing evidence that SVs can further impact the gene regulatory landscape by altering the formation of topologically associating domains (TADs), genomic regions that physically interact with themselves more frequently than with regions elsewhere (Sadowski et al., 2019; Shanta et al., 2020).
There is growing evidence that SVs–in particular, the suppression of recombination and subsequent evolution of ‘supergenes’, or tightly linked co-adapted alleles–can impact fitness traits in natural populations (Huynh et al., 2011; Jay et al., 2018; K.-W. Kim et al., 2017). For example, sperm swimming speed in zebra finch is determined by inversion haplotypes, with heterokaryotypic males producing faster sperm than homokaryotypic males (K.-W. Kim et al., 2017; Knief et al., 2017). In addition, a supergene resulting from an inversion has been found to determine mating strategy and morphology in the Eurasian ruff (Küpper et al., 2016). Moreover, it is notable that this inversion is likely a lethal recessive variant (Lamichhaney et al., 2016). Similarly, a large inversion resulting in a supergene underlies significant morphological and behavioral differences among white-striped and tan-striped morphs in White-throated sparrows, with aggressiveness being monogenic in the white-striped morph (Merritt et al., 2020). Although many fitness traits are likely to be polygenic, including traits of conservation interest such as disease susceptibility, reduced fertility and developmental abnormalities (e.g., Moran et al., 2021; Murchison et al., 2012; Roelke, Martenson, & O’Brien, 1993; Savage, Crane, Team, & Hemmings, 2020), those impacted by supergenes are likely to have relatively simple inheritance patterns which will enable their characterization and management.
To date, there are generally two approaches to investigate the genomic basis of traits, both of which require clearly defined traits and well-curated data sets and high-quality, well annotated, reference genomes: 1) Comparative genomics, where well-characterized groups (individuals, populations or species) are used to identify highly differentiated genomic regions (Alonge et al., 2020; McHale et al., 2012; vonHoldt et al., 2017; Weissensteiner et al., 2020); and 2) Association studies, where associations between specific markers (ie., SNPs and/or SVs) and traits are assessed (Chakraborty et al., 2019). For example, association studies and/or comparative approaches using SVs have revealed signatures of domestication (e.g., aquaculture salmon and dogs; Bertolotti et al., 2020; Cagan & Blass, 2016; vonHoldt et al., 2017), and the relative contribution of SVs to species evolution  (e.g., Atlantic cod, Corvids, Heliconius butterflies, Sunflowers; Berg et al., 2017; Joron et al., 2011, 2006; Rieseberg, Whitton, & Gardner, 1999; Weissensteiner et al., 2020).
Whole-genome SNP-based association studies have been extensively applied to model organisms, agriculturally significant species and humans, although they are increasingly applied to wild and non-model species (Santure & Garant, 2018). However, for many SNP-based association studies, significantly associated loci do not explain the vast majority of known trait heritability (i.e., the proportion of trait variation that is due to genetic rather than environmental differences between individuals; Clarke & Cooper, 2010; Eichler et al., 2010; Manolio et al., 2009). This “missing heritability” is often attributed to genetic variation not explained by genotyped loci, that is to say that causal variants may remain uncharacterized (Manolio et al., 2009). Another hypothesis to explain this missing heritability is that SVs are a significant source of trait variation (Eichler et al., 2010; Frazer, Murray, Schork, & Topol, 2009). Further, although a given SV is likely to be linked to a nearby SNP (Wilder, Palumbi, Conover, & Therkildsen, 2020), they may not be in strong linkage disequilibrium with each other, so that the SNP will not capture the impact of the SV on the trait (Pang et al., 2010). Structural variants may also lead to challenges in aligning reads and calling SNPs in a region, in turn preventing the identification of SVs from sequence data (Pang et al., 2010). The inclusion of SVs into association studies is therefore likely to lead to better power to detect causative variants, particularly given advances in GWAS methods, including machine-learning approaches, that can more effectively prioritize variants in a region and account for non-additive interactions (Brieuc, Ono, Drinan, & Naish, 2015; Ramzan, Gültas, Bertram, Cavero, & Schmitt, 2020). This is particularly promising as evidence suggests that SVs have larger phenotypic effects and may be deleterious more often than SNPs (Chakraborty et al., 2019; Conrad et al., 2010; Cridland, Macdonald, Long, & Thornton, 2013; Emerson, Cardoso-Moreira, Borevitz, & Long, 2008; Rogers et al., 2015). For example, a quantitative trait locus (QTL) study for Drosophila melanogaster found that about half of all candidate genes underpinning mapped QTL were impacted by SVs, and that a large proportion of genes contained multiple rare SVs (Chakraborty et al., 2019).

Structural variant discovery and genotyping in threatened species

A high-quality reference genome is an invaluable investment

While linkage mapping and cytological techniques can be used to characterize SVs (Deakin et al., 2019), here we focus on the approaches most likely to be accessible to the conservation genetics community. Partnerships with global genome consortia (e.g., Vertebrate Genomes Project, Rhie et al., 2021; Earth BioGenome Project, Lewin et al., 2018; Nature and Zoonomia Consortium, Genereux et al., 2020) are providing increased accessibility to highly contiguous, well-annotated genome assemblies generated from both short and long read data and long-range scaffolding approaches for many species, including those of conservation concern and their close relatives (Whibley, Kelley, & Narum, 2020). As costs of generating and analyzing WGS data continue to drop, a growing number of conservation genomicists working on species beyond global genome consortia are investing in the assembly and annotation of high-quality reference genomes. We readily recognize that not all conservation programs are able to access the resources or sample quality to generate a high-quality, well annotated reference genome, nor do we recommend this action for all threatened species. Where there is a clear hypothesis that SVs may be impacting a particular gene or region, comparative approaches (Alonge et al., 2020; Weissensteiner et al., 2020), and target capture methods (vonHoldt et al., 2017) may be the most cost-effective, especially if a reference genome from a closely related species can be leveraged instead. However, as described below, investment in a high-quality reference genome is a near necessity for accurate and precise SV discovery and genotyping, particularly if the goal is to characterize a broad range of SV types for downstream functional analyses (e.g., investigating the genomic basis of traits linked to fitness). Further, if the goal is to characterize a large proportion of SVs–including large and/or complex SVs–across the genome, integration of multiple reference genomes provide a powerful tool for accurately characterizing genome-wide variation and may facilitate genotyping across multiple variant classes (i.e., both SNPs and SVs; Figure 2).