Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

The impact of sequencing depth and relatedness of the reference genome in population genomic studies: a case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche)
  • +6
  • Xiling Deng,
  • Paul Frandsen,
  • Rebecca Dikow,
  • Adrien Favre,
  • Deep Shah,
  • Ram Devi Tachamo Shah,
  • Julio Schneider,
  • Jacqueline Heckenhauer,
  • Steffen Pauls
Xiling Deng
Senckenberg Gesellschaft fur Naturforschung

Corresponding Author:[email protected]

Author Profile
Paul Frandsen
LOEWE Center for Translational Biodiversity Genomics
Author Profile
Rebecca Dikow
Smithsonian Institution Office of the Chief Information Officer
Author Profile
Adrien Favre
Senckenberg Gesellschaft fur Naturforschung
Author Profile
Deep Shah
Tribhuvan University
Author Profile
Ram Devi Tachamo Shah
Kathmandu University School of Science
Author Profile
Julio Schneider
Senckenberg Gesellschaft fur Naturforschung
Author Profile
Jacqueline Heckenhauer
Senckenberg Gesellschaft fur Naturforschung
Author Profile
Steffen Pauls
Senckenberg Gesellschaft fur Naturforschung
Author Profile

Abstract

Whole-genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing coverage to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing coverage on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing coverage (3.5x, 7.5x and 12x) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three coverages × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise and genome-wide FST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing coverage lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low coverage. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.
22 Sep 2022Submitted to Ecology and Evolution
24 Sep 2022Submission Checks Completed
24 Sep 2022Assigned to Editor
24 Sep 2022Review(s) Completed, Editorial Evaluation Pending
27 Sep 2022Editorial Decision: Revise Minor
10 Nov 20221st Revision Received
10 Nov 2022Submission Checks Completed
10 Nov 2022Assigned to Editor
10 Nov 2022Review(s) Completed, Editorial Evaluation Pending
16 Nov 2022Editorial Decision: Accept