1. Introduction
Ancient DNA research provides direct evidence to reconstruct the
prehistoric biogeography and biodiversity, which can further help to
explain long-standing questions in evolution, phylogeny, taxonomy, and
adaptations (1-5). Ancient DNA research has developed rapidly over the
past thirty years due to the improvement in PCR and next-generation
sequencing (NGS) technologies. The first successful attempt to extract
ancient DNA was made by Higuchi et al (1984), where muscle DNA ofEquus quagga was extracted and DNA fragments of 228 bp were
amplified (6, 7). With the advancement in
biomolecular techniques, it is
now possible to extract and amplify ancient DNA fragments from different
ancient species and biological samples, including bones, teeth, soft
tissue, fur, and fossilized excrements (7, 8). Studies on ancient DNA
were previously restricted to mitochondrial DNA and extremely short
nuclear DNA fragments (7, 9). However, the advent of NGS technology has
enabled ancient DNA studies at the whole-genome level. Consequently, the
number of ancient DNA studies has increased exponentially in the last
decade(10). The first whole genome of the woolly mammoth was sequenced
in 2008(11). Three Neanderthals’ genomes were also sequenced in 2010,
which revealed an extensive gene flow to modern humans (12). In 2012,
the first high coverage genome (~30X) of Denisovans was
published(13). In 2015, Allentoft
et al. (2015) sequenced 101
ancient humans at the whole genome level (14). At present, more than
1100 ancient human and hominine genomes (15) and more than 300 ancient
animal genomes (2, 16, 17) have been sequenced and published.
Although great breakthroughs have been made in ancient DNA extraction,
library preparation and bioinformatics, there still remain some
challenges (18-21). Effective mapping and distinguishing of the
present-day DNA contaminations from the endogenous ancient DNA is still
complicated and difficult to perform, and needs to be improved for
ancient DNA analysis. It is particularly difficult to filter the
present-day human DNA contamination from ancient human or hominine DNA
(22, 23). Ancient DNA is often degraded into very small fragments due to
physical, chemical or biological factors during the long-term
preservation in unfavorable conditions. These effects always leave
valuable marks on ancient DNA to help us distinguish it from modern DNA,
including the C-to-T changes at the ends of ancient DNA fragments
induced by deamination, high proportion of purine bases at the first
physical position preceding the ancient DNA fragments, and the severely
fragmented nature (4, 21). These unique characteristics of ancient DNA
can be used to identify the true ancient DNA.
Bioinformatics methods have been developed for mapping and extracting
endogenous ancient DNA from total ancient DNA (18, 21). In the mapping
procedure for ancient DNA, the software BWA with the parameters set
“aln -l 1024 -n 0.03 ” is usually applied to map ancient
sequencing data against the reference genome(18). However, this process
is time-consuming. A newly developed method like BWA mem with theseed-reseed-extend algorithm, provides new insights for mapping
of ancient DNA(24). Meanwhile, Skoglund et al (21) developed
PMDtools to separate genuine endogenous DNA from homologous
contaminations. This method is effective in filtering modern human
contaminated DNA from the ancient human DNA. However, it is difficult
for the PMDtools to set an appropriate threshold value of PMDS, when the
contamination rate cannot be accurately evaluated. Besides, the power of
PMDtools is further weakened for extremely young or old ancient samples.
In this study, we collected whole genome sequencing data generated by
Illumina Hiseq platform from 6 samples (representing three species) to
optimize the ancient DNA mapping, which is critical to improving the
mapping rate of endogenous ancient DNA. Since optimization of ancient
DNA mapping may not only require filtering of present-day contaminations
from endogenous ancient DNA, we further explored a more universal and
effective filtration pipeline to filter present-day contaminations based
on the ancient DNA cytosine deamination, depurination and fragmentation
using the simulated data. The final recommendations presented in this
study enabled reduction of modern human DNA contamination to an
extremely low level while maintaining a high rate of endogenous DNA. The
mapping guidelines coupled with screening recommendations to control for
modern DNA contamination could support future studies on ancient DNA.