Detecting selective sweeps
We detected selective sweeps in the Bermuda and Kauai populations by
calculating three relevant statistics, keeping the top 1% regions per
statistic, and then selecting regions that were recognised as a sweep by
at least two out of the three methods, using the R package GenomicRanges
version 1.46.1 (Lawrence, Huber et al. 2013). Only autosomal regions
were used. First, we estimated pooled Tajima’s D. Each statistic was
calculated in 40 kb sliding windows along the autosomal genome. We
standardized the Tajima’s D values by subtracting the mean and dividing
by the standard deviation. VCFTools version 0.1.16 was used (Daneceket al ., 2011).
As a second method, we ran a composite likelihood ratio test for
positive selection (Kim and Stephan 2002). With this, the likelihood
ratio of the null hypothesis is calculated from the neutral
(genome-wide) frequency spectrum, whilst the alternative hypothesis is
calculated using a model where neutral selection has been altered by
recent selection. This was calculated by using SweepFinder2 version 1.0
(DeGiorgio, Huber et al. 2016). In particular, this technique can
separate out footprints of positive selection from background selection
(with this being a loss of neutral variation due to a purging of linked
deleterious alleles via negative selection (Charlesworth 2012)). To
conduct this, we first computed an empirically-derived allele frequency
file based on all chromosomal data. This file serves as a null
hypothesis in order to calculate a likelihood ratio to detect positive
selection. A whole genome scan for selective sweeps was then conducted
as per the recommendations given in (DeGiorgio, Huber et al. 2016). A 20
kb window was used to detect selective sweeps.
The third method used the concept of ‘Extended Haplotype Homozygosity’.
Where genomic regions with high local haplotype homozygosity are
detected, this can be an excellent indication of signatures of positive
selection, with such haplotype structure useful in detecting selective
sweeps (Sabeti, Reich et al. 2002). Strong selection with commensurate
Linkage Disequilibrium should lead to an expansion of such haplotypes in
the population, prior to them being slowly broken down by recombination.
This premise led Sabati et al to develop the Extended Haplotype
Homozygosity test, with this later expanded upon by Voight et al
(Voight, Kudaravalli et al. 2006), Sabati et al (Sabeti, Varilly et al.
2007) and Tang et al (Tang, Thornton et al. 2007). This test measures
the extent to which an extended haplotype has been transmitted without
recombination. Firstly, an allele-specific integrated Haplotype
Homozygosity (iHH) is calculated, with this then used to calculate the
iHS (a ratio of the iHH for its ancestral and derived alleles). We used
the R package rehh version 3.2.2 (Gautier, Klassmann et al. 2017) to
calculate the iHS statistic for each individual SNP, by running the
data2haplohh, scan_hh and ihh2ihs functions . We then used the per SNP
iHS statistic to calculate the maximum iHS statistic for each 20 kb
window using the R package tidyverse version 2.0.0 (Wickham, Averick et
al. 2019).
Gene
annotation
We downloaded the Gallus gallus Biomart files from the Ensembl ftp
server
(https://ftp.ensembl.org/pub/release-104/mysql/gallus_gallus_core_104_6/)
and added them to a local PostgreSQL v15 database. Earlier detected
selective sweep regions were added to the same database. Custom SQL
queries were written to select all the known genes that were found in
these regions.
Overlap
tests
We used a simulation test to determine the number of overlaps observed
between sweep regions on Bermuda and Kauai was greater than expected by
random chance. The test consisted of placing two sets of regions
uniformly at random on an interval the size of the autosomal sequenced
chicken genome, and counting the overlaps. The two sets had numbers and
lengths equal to the number and average length of sweeps observed on
Bermuda and Kauai. A permutation procedure was used to calculate the
significance, with 5000 replicates used and the number of observed
overlaps compared to the probability of obtaining the same number of
overlaps by chance (https://github.com/mrtnj/bermuda_overlaps).
Chromosome
painting
We used CHROMOPAINTERV2 (Lawson et al. , 2012) to compare the
Bermuda sweep regions to the other populations (Kauai, Red Junglefowl
and Domestic chickens). First we combined the vcf files from the
separate populations into one vcf file per chromosome using bcftools
v1.14 (Danecek, Bonfield et al. 2021). Then, we lifted an earlier Gallus
gallus recombination map (Elferink, van As et al. 2010) to Galgal6 using
LiftOver(https://genome.ucsc.edu/cgi-bin/hgLiftOver). Then we
converted our vcf files and the newly acquired recombination map to an
accepted chromopainter format by using the vcf2cp.pl and
convertracfile.pl scripts include in the fineSTRUCTURE version 4.1.1
library (Lawson, Hellenthal et al. 2012). SNPs were then phased using
SHAPEIT v5.1 (Hofmeister, Ribeiro et al. 2023). Then we ran
ChromopainterV2, using the default parameters, in each selective sweep
regions flanked with 20 kb on each side. We painted Kauai and Bermudian
populations using Red Junglefowl and domestic sequences as donors.
Images were created with the R package tidyverse version 2.0.0 (Wickham,
Averick et al. 2019).