1. Introduction

Ancient DNA research provides direct evidence to reconstruct the prehistoric biogeography and biodiversity, which can further help to explain long-standing questions in evolution, phylogeny, taxonomy, and adaptations (1-5). Ancient DNA research has developed rapidly over the past thirty years due to the improvement in PCR and next-generation sequencing (NGS) technologies. The first successful attempt to extract ancient DNA was made by Higuchi et al (1984), where muscle DNA ofEquus quagga was extracted and DNA fragments of 228 bp were amplified (6, 7). With the advancement in biomolecular techniques, it is now possible to extract and amplify ancient DNA fragments from different ancient species and biological samples, including bones, teeth, soft tissue, fur, and fossilized excrements (7, 8). Studies on ancient DNA were previously restricted to mitochondrial DNA and extremely short nuclear DNA fragments (7, 9). However, the advent of NGS technology has enabled ancient DNA studies at the whole-genome level. Consequently, the number of ancient DNA studies has increased exponentially in the last decade(10). The first whole genome of the woolly mammoth was sequenced in 2008(11). Three Neanderthals’ genomes were also sequenced in 2010, which revealed an extensive gene flow to modern humans (12). In 2012, the first high coverage genome (~30X) of Denisovans was published(13). In 2015, Allentoft et al. (2015) sequenced 101 ancient humans at the whole genome level (14). At present, more than 1100 ancient human and hominine genomes (15) and more than 300 ancient animal genomes (2, 16, 17) have been sequenced and published.
Although great breakthroughs have been made in ancient DNA extraction, library preparation and bioinformatics, there still remain some challenges (18-21). Effective mapping and distinguishing of the present-day DNA contaminations from the endogenous ancient DNA is still complicated and difficult to perform, and needs to be improved for ancient DNA analysis. It is particularly difficult to filter the present-day human DNA contamination from ancient human or hominine DNA (22, 23). Ancient DNA is often degraded into very small fragments due to physical, chemical or biological factors during the long-term preservation in unfavorable conditions. These effects always leave valuable marks on ancient DNA to help us distinguish it from modern DNA, including the C-to-T changes at the ends of ancient DNA fragments induced by deamination, high proportion of purine bases at the first physical position preceding the ancient DNA fragments, and the severely fragmented nature (4, 21). These unique characteristics of ancient DNA can be used to identify the true ancient DNA.
Bioinformatics methods have been developed for mapping and extracting endogenous ancient DNA from total ancient DNA (18, 21). In the mapping procedure for ancient DNA, the software BWA with the parameters set “aln -l 1024 -n 0.03 ” is usually applied to map ancient sequencing data against the reference genome(18). However, this process is time-consuming. A newly developed method like BWA mem with theseed-reseed-extend algorithm, provides new insights for mapping of ancient DNA(24). Meanwhile, Skoglund et al (21) developed PMDtools to separate genuine endogenous DNA from homologous contaminations. This method is effective in filtering modern human contaminated DNA from the ancient human DNA. However, it is difficult for the PMDtools to set an appropriate threshold value of PMDS, when the contamination rate cannot be accurately evaluated. Besides, the power of PMDtools is further weakened for extremely young or old ancient samples.
In this study, we collected whole genome sequencing data generated by Illumina Hiseq platform from 6 samples (representing three species) to optimize the ancient DNA mapping, which is critical to improving the mapping rate of endogenous ancient DNA. Since optimization of ancient DNA mapping may not only require filtering of present-day contaminations from endogenous ancient DNA, we further explored a more universal and effective filtration pipeline to filter present-day contaminations based on the ancient DNA cytosine deamination, depurination and fragmentation using the simulated data. The final recommendations presented in this study enabled reduction of modern human DNA contamination to an extremely low level while maintaining a high rate of endogenous DNA. The mapping guidelines coupled with screening recommendations to control for modern DNA contamination could support future studies on ancient DNA.