Base Filter
After applying the base level filters to each caller’s SNP set, the two GATK callers, UnifiedGenotyper and HaplotypeCaller, resulted in the greatest number of SNPs called and the highest mismatch rates by site and by genotype (Table 2). SAMtools and FreeBayes called an order of magnitude fewer SNPs than the GATK callers, but they also resulted in mismatch rates an order of magnitude lower (Table 2). Finally, VarScan called the lowest number of SNPs and resulted in the lowest mismatch rates by site and by genotype, all metrics two orders of magnitude lower than the GATK callers (Table 2). The strong correlation between the number of SNPs a program called after base filtering and its mismatch rate (R2 = 99.4%; Fig. S1) led us to apply our additional incremental filtering method to better facilitate the comparison among variant callers.
Despite the different variant callers generating SNP sets orders of magnitude different in size, the distributions of parent-offspring genotype mismatches across the sites called were very similar among programs (i.e., heavily right-skewed; Fig. S2). UnifiedGenotyper, however, produced an overinflation of sites with a 50% genotype mismatch rate, suggesting a higher rate of genotyping error in the parent than seen with the other variant callers (Fig. S2).