wholeskim: Utilizing genome skims for taxonomically annotating ancient DNA metagenomes

Lucas Elliott; Frédéric Boyer; Téo Lemane; Inger Alsos; Eric Coissac

doi:10.22541/au.172529953.39892767/v1

loading page

wholeskim: Utilizing genome skims for taxonomically annotating ancient DNA metagenomes

Lucas Elliott,
Frédéric Boyer,
Téo Lemane,
Inger Alsos,
Eric Coissac

Abstract

Inferring community composition from shotgun sequencing of environmental DNA is highly dependent on the completeness of reference databases used to assign taxonomic information as well as the pipeline used. While the number of complete, fully assembled reference genomes is increasing rapidly, their taxonomic coverage is generally too sparse to use them to build complete reference databases that span all or most of the target taxa. Low-coverage, whole genome sequencing, or skimming, provides a cost-effective and scalable alternative source of genome-wide information in the interim. Without enough coverage to assemble large contigs of nuclear DNA, much of the utility of a genome skim in the context of taxonomic annotation is found in its short read form. However, previous methods have not been able to fully leverage the data in this format. We demonstrate the utility of wholeskim, a pipeline for the indexing of k-mers present in genome skims and subsequent querying of these indices with short DNA reads. Wholeskim expands on the functionality of kmindex, a software which utilizes Bloom filters to efficiently index and query billions of k-mers. Using a collection of thousands of plant genome skims, wholeskim is the only software that is able to index and query the skims in their unassembled form. We also explore the effects of taxonomic and genomic completeness of the reference database on the accuracy and sensitivity of read assignment.

27 Aug 2024Submitted to Molecular Ecology Resources

Show details

Hide details

29 Aug 2024Submission Checks Completed

29 Aug 2024Assigned to Editor

29 Aug 2024Review(s) Completed, Editorial Evaluation Pending

09 Sep 2024Reviewer(s) Assigned

Abstract

Peer review status:UNDER REVIEW