loading page

Evaluating the accuracy of variant calling methods using the frequency of parent-offspring genotype mismatch
  • +4
  • Russ J. Jasper,
  • Tegan Krista McDonald,
  • Pooja Singh,
  • Mengmeng Lu,
  • Clément Rougeux,
  • Brandon M. Lind,
  • Sam Yeaman
Russ J. Jasper
University of Calgary

Corresponding Author:[email protected]

Author Profile
Tegan Krista McDonald
University of Calgary
Author Profile
Pooja Singh
University of Calgary
Author Profile
Mengmeng Lu
University of Calgary
Author Profile
Clément Rougeux
University of Calgary
Author Profile
Brandon M. Lind
The University of British Columbia
Author Profile
Sam Yeaman
University of Calgary
Author Profile

Abstract

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.
21 Sep 2021Submitted to Molecular Ecology Resources
24 Sep 2021Submission Checks Completed
24 Sep 2021Assigned to Editor
04 Oct 2021Reviewer(s) Assigned
08 Dec 2021Review(s) Completed, Editorial Evaluation Pending
05 Jan 2022Editorial Decision: Revise Minor
11 Feb 2022Review(s) Completed, Editorial Evaluation Pending
11 Feb 20221st Revision Received
25 Feb 2022Editorial Decision: Revise Minor
11 Mar 20222nd Revision Received
11 Mar 2022Review(s) Completed, Editorial Evaluation Pending
29 Mar 2022Editorial Decision: Accept
Oct 2022Published in Molecular Ecology Resources volume 22 issue 7 on pages 2524-2533. 10.1111/1755-0998.13628