loading page

The Critical Role of Choosing Appropriate Genotype Caller and Reference Genome for Population Genomic Inference: A Demonstration Study with Five Pterocarya Species
  • +5
  • Yang Yang,
  • Xu Pang,
  • Mei Peng,
  • Gang Song,
  • Wen Zhang,
  • Kui Lin,
  • Da-Yong Zhang,
  • Wei-Ning Bai
Yang Yang
Beijing Normal University College of Life Sciences
Author Profile
Xu Pang
Beijing Normal University
Author Profile
Mei Peng
Beijing Normal University
Author Profile
Gang Song
Shanghai Chenshan Plant Science Research Center
Author Profile
Wen Zhang
Beijing Normal University College of Life Sciences
Author Profile
Kui Lin
Beijing Normal University
Author Profile
Da-Yong Zhang
Beijing Normal University
Author Profile
Wei-Ning Bai
State Key Laboratory of Systematic and Evolutionary Botany

Corresponding Author:[email protected]

Author Profile

Abstract

Contemporary population genomic studies typically involve mapping raw reads to a reference genome and analyzing single nucleotide polymorphism (SNP) data obtained from variant calling. Despite the widespread use of the genotype caller GATK for variant calling, its design primarily for human data poses limitations in non-human species. Recently, ATLAS has emerged as a promising alternative caller, exhibiting superior performance with lower false positive and negative rates, significantly impacting phylogenomic inferences. However, the extent to which ATLAS versus GATK influences downstream population genomic analyses remains largely unexplored. To address this gap, we conducted a population genomic study on five Pterocarya species using GATK and ATLAS, alongside two reference genomes, P. stenoptera and P. macroptera. Analyzing four datasets, we evaluated mapping depth, coverage rate, linkage disequilibrium (LD), nucleotide diversity (π), population structure, and demographic history. Notably, using P. stenoptera as the reference genome resulted in less depth and coverage rate variation across species compared to P. macroptera. ATLAS consistently identified more SNPs, higher nucleotide diversity, and lower LD for both reference genomes. Population structure results were more sensitive to the choice of reference genome than callers, while both reference genomes and callers significantly influenced population demography inference. Our study emphasizes the critical impact of genotype caller and reference genome selection on downstream analyses. Based on current evidence, selecting a closely related reference genome and employing ATLAS for SNP calling are recommended to enhance the accuracy and reliability of population genomic studies.