Running title: De novo assembly of Takin (B. taxicolor)
Anning Li1#, Qimeng Yang1,2#, Ran Li1,2, Xuelei Dai1,2, Keli Cai1, Yinghu Lei3, Kangsheng Jia3, Yu Jiang1,2*, Linsen Zan1,3*
1 College of Animal Science and Technology, Northwest A&F University, Yangling 712100, Shaanxi, P. R. China
2 Center for Ruminant Genetic and Evolution, Northwest A&F University, Yangling 712100, Shaanxi, P. R. China
3 Research Center for the Qinling Giant Panda (Shaanxi Rare Wildlife Rescue Base), Shaanxi Academy of Forestry Sciences, Zhouzhi 710402, Shaanxi, P. R. China
# These authors contributed equally to this study.
* Corresponding E-mail: yu.jiang@nwafu.edu.cn ;zanlinsen@163.com .
Abstract: The takin (Budorcas taxicolor ) is one of the largest bovid herbivores across caprinae subfamily. The takin is at high risk of extinction, however, its taxonomic status is still unclear. In this study, we constructed the first reference genome of B. taxicolor using PacBio long High-Fidelity reads and Hi-C technology. The assembled genome is ~2.95 Gb with a contig N50 of 68.05 Mb and a scaffold N50 of 101.27 Mb, which were anchored onto 25+XY chromosomes. Compared to the common ancestral karyotype of bovidae (2n=60), we found the takin (2n=52) experienced four chromosome fusions and one large translocation. We also found that the takin was most closely related to muskox, not other caprinae species. Further, we re-sequenced nine golden takins from the main distribution area, Qinling Mountains, and identified 3.3 million SNPs. The genetic diversity of takin was very low (θπ=0.00028 and heterozygosity=0.00038), which was among the lowest detected in the domestic and wild mammals. We also found takin genomes showed high inbreeding coefficient (FROH=0.217) suggesting severe inbreeding depression. The genome analysis show that the effective population size of takins declined significantly from ~100,000 years ago. Our results provide valuable information for protection of takins and insights into its evolution.
Keywords : Takin; PacBio HiFi; Hi-C; Chromosomal evolution; Inbreeding depression
Introduction
The takin (Budorcas taxicolor ) is a large bovid herbivore, belonging to the Caprinae subfamily. The number of takins was estimated only about 7,000 ~ 12,000 in the world (Cheng et al. 2020). Takin has been listed as vulnerable by the International Union for Conservation of Nature (IUCN) (Zeng and Song 2002). It was divided into four subspecies according to the physiological characteristics and geographical location (Wu 1986). Qinling takin (B. t. bedfordi ) and Sichuan takin (B. t. tibetana ) are only confined in China, while the Mishmi takin (B. t. taxicolor ) has a distributed area from Gaoligong Mountains in northwestern Yunnan province, China to India and Myanmar (Wu 1986). In addition, Bhutan takin (B. t. whieti ) is found in Bhutan and the Yarlung Zangbo River in Tibet, China (Li et al. 2003). Qinling takin (B. t. bedfordi ), also known as golden takin, is mainly distributed in the Qinling Mountains of China (Figure 1a ) (Li et al. 2021).
Compared to other Caprinae animals, the morphology, ecological traits and G-banded karyotype of takin were found to be similar to muskox (Ovibos moschatus ) (Pasitschniak-Arts et al. 1994). However, the analysis of mitochondrial cytochrome b genes (Cytb ) sequences showed that muskox and takin were not close with each other but under convergent evolution (Groves and Shields 1997, Ren et al. 2012). Recently, based on the complete mitochondrial genome, takin was found to be closely related to goat (Capra hircus ) rather than sheep (Ovis aries ) (Feng et al. 2016, Kumar et al. 2019, Zhou et al. 2019). In our previous study, takin was found to be closely related to sheep rather than goat according to the transcriptome analysis (Qiu et al. 2021). Thus, to clarify on its taxonomic status, further analysis is needed at the whole genomic level.
Recently, highly accurate long high-fidelity (HiFi) reads was generated by PacBio single-molecule real-time (SMRT) sequencing with circular consensus sequencing (CCS) (Wenger et al. 2019). HiFi reads combined with Hi-C technology have been used to construct chromosome-level reference genome of various animals (Wu et al. 2021, Zhou et al. 2021) and plants (Chen et al. 2021, Huang et al. 2021, Ma et al. 2021, Sharma et al. 2021, Wang et al. 2021). In this study, we performed the PacBio long HiFi reads and Hi-C technology to construct a high-quality chromosome-level reference genome of takin. The phylogenetic relationship, chromosomal evolution, genetic diversity, demographic history and inbreeding depression of takin were analyzed. These information will be helpful for understanding the evolution of the takin and working towards its conservation.
Materials and Methods
2.1 Sample collection and ethics statement
The liver sample from a dead adult male golden takin was used forde novo genome sequencing. Seven Blood samples (alive) and two muscle samples (dead) from nine golden takins were used for genome resequencing. All of the samples were collected from the Rare Wildlife Rescue and Breeding Research Center in Qinling Mountains. All the experimental procedures were carried out according to the guidelines of the China Council on Animal Care and approved by the Experimental Animal Management Committee (EAMC) of Northwest A&F University.
Genome sequencing and assembly
A 15 Kb DNA SMRTbell library was constructed for SMRT sequencing according to a standard protocol (Ardui et al. 2018) using PacBio Sequel II platform with circular consensus sequencing (CCS). HiFi reads were assembled in de novo using Hifiasm (v0.13) (Cheng et al. 2021). Then the restriction enzyme Mbo I was used to digest cross-linked chromatin to construct the Hi-C library, which was sequenced on an Illumina NovaSeq6000 platform. The Hi-C contigs were anchored onto the chromosomes using Juicer (v1.6) (Durand et al. 2016) and 3D-DNA (v201008) (Dudchenko et al. 2017) combined with Juicebox (https://github.com/theaidenlab/juicebox ). BUSCO (v3.0.2) was used to assess the completeness of assembled genome (Simao et al. 2015). Genome resequencing was performed by the Illumina NovaSeq6000 platform. All of the sequencing services were provided by the Berry Genomics Biotechnologies Co., Ltd. (Beijing, China). A resequencing genome of golden takin (TX-11) was selected to estimate the genome size and heterozygosity rate by gce (v1.0.2) (https://arxiv.org/abs/1308.2012v2 ).
Repeats and gene annotation
The RepeatMasker software (v4.1.2) (http://www.repeatmasker.org) and the RepBase library (2018.10.26) were used to identify the repeats in takin genome. Homology-based method was used to predict the protein-coding genes with RNA-seq data. Firstly, we annotated the protein-coding genes based on the protein sequences from Bos taurus , C. hircus ,O. aries and Homo sapiens using the BRAKER (v2.1.5) pipeline (Hoff et al. 2016, Hoff et al. 2019). Secondly, we aligned three RNA-seq datasets (PRJNA720167) to the takin’s genome using STAR software (v2.7.1a) (Dobin et al. 2013), which were predicted by the BRAKER pipeline. Next, the BRAKER with TSEBRA module (Gabriel et al. 2021) was used to integrate the prediction results. Finally, the prediction results were used to align against the non-redundant (Nr) database using the DIAMOND (v2.0.11.149) software (Buchfink et al. 2015, Buchfink et al. 2021). GC content distribution of genome was performed by bedtools (v2.26.0) with a window of 500 kb (Quinlan 2014).
Mitochondrial genome assembly
All the HiFi reads (Average length 14 kb) were blasted to the mitochondrial genome of Sichuan takin (GenBank accession No. NC_039686.1) and assembly reference genome of golden takin using the Minimap2 (v2.22-r1101) (Li 2018). The HiFi reads aligned to NC_039686.1 were selected to assemble the mitochondrial genome of golden takin. The sequence similarity was analyzed using the BLAST search program. The annotation of mitochondrial genome was completed by the AGORA online platform (Jung et al. 2018).
Phylogenetic analysis and divergence time estimation
We downloaded nine mitochondrial genome from GenBank database to construct a phylogenetic tree, including B. taurus (GenBank accession No. NC_006853.1), C. hircus (GenBank accession No. NC_005044.2), O. aries (GenBank accession No. NC_001941.1),Pantholops hodgsonii (GenBank accession No. NC_007441.1),O. moschatus (GenBank accession No. NC_020631.1), Oreamnos americanus (GenBank accession No. NC_020630.1), Ammotragus lervia (GenBank accession No. NC_009510.1), Pseudois nayaur(GenBank accession No. NC_020632.1), and B. taxicolor (GenBank accession No. NC_039686.1, NC_013069.1, NC_043930.1, KY399869.1, KU361169.1 and OM237313 ). Firstly, the mitochondrial genomes were aligned by MUSCLE (v3.8.31) (Edgar 2004) to exclude the ambiguous regions. Then, the neighbor-joining (NJ) method was selected using MEGA (v11) (Tamura et al. 2021) with 1000 bootstrap replicates.
The genome of B. taurus  (ARS-UCD1.2) (Rosen et al. 2020) was selected as reference, and that of O. americanus (ASM975805v1) (Martchenko et al. 2020),O. aries(Oar_rambouillet_v1.0),C. hircus (ARS1) (Worley 2017),P. hodgsonii (PHO1.0) (Ge et al. 2013),A. lervia(ALER1.0) (Chen et al. 2019),Pseudois nayaur (ASM318257v1) (Chen et al. 2019), O. moschatus (ASM2146233v1) and B. taxicolor(Takin1.0) were separately aligned to the reference genome using LAST software (v942) (Kielbasa et al. 2011). The pair-wise alignments were merged into multiple genome alignments using the MULTIZ software (v11.2) (Blanchette et al. 2004). The consensus coding sequences of nine species were extracted using the Perl scripts from RGD (v2.0). 5, 862 single-copy homologous genes were identified using the DIAMOND (v2.0.11.149) and RAxML (v8.2.12) (Stamatakis 2014) with Maximum Likelihood (ML) method to construct the phylogenetic tree. The estimations of divergence time were calculated using PAML MCMCtree (v4.9j) (Yang 2007) with calibrated time from the previous study (Chen et al. 2019).
Chromosomal evolution
The ancestral chromosome karyotype was reconstructed using the genomes of B. taurus (Rosen et al. 2020), C. hircus (Li et al. 2021), O. aries (Davenport et al. 2021), B. taxicolor andPhyseter catodon (Fan et al. 2019) (as outgroup). The genome ofB. taurus was selected as the reference, and other genomes were aligned to the reference genome using LAST (v942) (Kielbasa et al. 2011). Next, “chain” and “net” files were generated from axtChain and ChainNet, which were used as input for DESCHRAMBLER at a 350 kb resolution (Kim et al. 2017). Lastly, we analyzed the collinearity of the genomes of B. taurus , O. aries and B. taxicolorusing mummer (v4.0beta2) (Marcais et al. 2018). We visualized the collinearity region and detected the chromosome fusion events using the RectChr (https://github.com/BGI-shenzhen/RectChr ).
Heterozygosity estimation
Genome resequencing datasets from nine golden takins were aligned with the assembly reference genome using BWA-MEM (v0.7.17) (http://arxiv.org/abs/1303.3997 ). The SAMtools (v1.7) (Li et al. 2009) was used to convert sam to bam files. The single nucleotide polymorphisms (SNPs) were called and filted using the GATK (v4.1.7.0) (McKenna et al. 2010) with parameters “QD < 2.0, QUAL < 30.0, FS > 60.0, MQ < 40.0, MQRankSum < -12.5, ReadPosRankSum < -8.0”. The nucleotide diversity (θπ) was calculated using the VCFtools (v0.1.16) with a 50 kb window (Danecek et al. 2011). Genome resequencing datasets 8 giant pandas (Zhao et al. 2013) were used to calculate θπ. The genome-wide heterozygosity rate was calculated as previously described (Liu et al. 2021). Genome resequencing datasets from 26 snub-nosed monkeys (Yu et al. 2016) were used to calculate heterozygosity. The heterozygosity rates of other species were obtained from previous reports (Li et al. 2010, Cho et al. 2013, Dobrynin et al. 2015, Liu et al. 2021).
ROH estimation and inbreeding coefficient
The runs of homozygosity (ROH) were estimated across autosomes for nine golden takins using PLINK (v1.90b6.21) (Chang et al. 2015). The following PLINK parameters (Purcell et al. 2007) were applied to define a ROH: (i) a 50 SNPs sliding window across the genome; (ii) the proportion of homozygous overlapping windows was 0.05; (iii) minimum number of consecutive SNPs included in a ROH was 50; (iv) required minimum density was set at one SNP per 50 kb; (v) maximum gap between consecutive homozygous SNPs was 1000 kb; and (vi) maximum of five SNPs with missing genotypes and up to three heterozygous genotype were allowed in a ROH. Inbreeding coefficient based on ROH (FROH) for each individual was calculated according to previous study (McQuillan et al. 2008). The number of generations was estimated to the common ancestor of these homologous sequences as previously described (Thompson 2013).