2.4. Separating Endogenous DNA from the Contaminations

To explore a more universal and effective pipeline to separate endogenous ancient DNA from homologous contaminations, we first screened reads with at least “DeamNum” C-to-T or G-to-A mutations within the first or last “DetectRange” base pair at 3’ and/or 5’ ends (“DoubleOrSingle”). For the “DeamNum” (which represents the number of C-to-T or G-to-A mutations), one, two and three were tested. For the “DetectRange” (which represents the base number), five, ten and fifteen were tested. For the “DoubleOrSingle”, either 3’ or 5’ end (parameter “or”) and both ends (parameter “and”) were included. We explored all 18 possible screening conditions by adjusting the parameter combinations (“DeamNum”, “DetectRange”, “DoubleOrSingle”) (Table S2). We wrote a program using Python to simplify this pipeline (home page:https://github.com/tianminglan/AncFil). One can test more possible conditions by adjusting parameters “-DeamNum”, “-DetectRange” and “-DoubleOrSingle” Secondly, given that there is a natural tendency of depurination at the 5’ ends of ancient DNA fragments (33), we screened reads with an A or G at the position preceding the first base of the 5’end. 3Finally, the effects of the length of ancient DNA fragments on separating endogenous DNA was evaluated. Here, two criteria’s were used to evaluate this pipeline: 1) CRT: the contamination rate after treatment (the number of contamination reads after filtering / the number of reads after filtering); 2) LRE: the loss rate of true endogenous DNA (the number of filtered endogenous ancient reads / the number of endogenous ancient reads before filtering);
Finally, PMDtools (21) were used to filter the homologous contaminations using the same data and evaluating criteria used to evaluate our recommended method above. Meanwhile, the “-threshold” is one of the most important parameters in PMDtools for adjusting the strictness of the filtration. To make a comprehensive comparison, we tested five threshold values (one, two, three, four and five) to adjust the PMD scores by setting “-threshold”.