2.4. Separating Endogenous DNA from the
Contaminations
To explore a more universal and effective pipeline to separate
endogenous ancient DNA from homologous contaminations, we first screened
reads with at least “DeamNum” C-to-T or G-to-A mutations within the
first or last “DetectRange” base pair at 3’ and/or 5’ ends
(“DoubleOrSingle”). For the “DeamNum” (which represents the number
of C-to-T or G-to-A mutations), one, two and three were tested. For the
“DetectRange” (which represents the base number), five, ten and
fifteen were tested. For the “DoubleOrSingle”, either 3’ or 5’ end
(parameter “or”) and both ends (parameter “and”) were included. We
explored all 18 possible screening conditions by adjusting the parameter
combinations (“DeamNum”, “DetectRange”, “DoubleOrSingle”) (Table
S2). We wrote a program using Python to simplify this pipeline (home
page:https://github.com/tianminglan/AncFil).
One can test more possible conditions by adjusting parameters
“-DeamNum”, “-DetectRange” and “-DoubleOrSingle” Secondly, given
that there is a natural tendency of depurination at the 5’ ends of
ancient DNA fragments (33), we screened reads with an A or G at the
position preceding the first base of the 5’end. 3Finally, the effects of
the length of ancient DNA fragments on separating endogenous DNA was
evaluated. Here, two criteria’s were used to evaluate this pipeline: 1)
CRT: the contamination rate after treatment (the number of contamination
reads after filtering / the number of reads after filtering); 2) LRE:
the loss rate of true endogenous DNA (the number of filtered endogenous
ancient reads / the number of endogenous ancient reads before
filtering);
Finally, PMDtools (21) were used to filter the homologous contaminations
using the same data and evaluating criteria used to evaluate our
recommended method above. Meanwhile, the “-threshold” is one of the
most important parameters in PMDtools for adjusting the strictness of
the filtration. To make a comprehensive comparison, we tested five
threshold values (one, two, three, four and five) to adjust the PMD
scores by setting “-threshold”.