Base Filter
After applying the base level filters to each caller’s SNP set, the two
GATK callers, UnifiedGenotyper and HaplotypeCaller, resulted in the
greatest number of SNPs called and the highest mismatch rates by site
and by genotype (Table 2). SAMtools and FreeBayes called an order of
magnitude fewer SNPs than the GATK callers, but they also resulted in
mismatch rates an order of magnitude lower (Table 2). Finally, VarScan
called the lowest number of SNPs and resulted in the lowest mismatch
rates by site and by genotype, all metrics two orders of magnitude lower
than the GATK callers (Table 2). The strong correlation between the
number of SNPs a program called after base filtering and its mismatch
rate (R2 = 99.4%; Fig. S1) led us to apply our
additional incremental filtering method to better facilitate the
comparison among variant callers.
Despite the different variant callers generating SNP sets orders of
magnitude different in size, the distributions of parent-offspring
genotype mismatches across the sites called were very similar among
programs (i.e., heavily right-skewed; Fig. S2). UnifiedGenotyper,
however, produced an overinflation of sites with a 50% genotype
mismatch rate, suggesting a higher rate of genotyping error in the
parent than seen with the other variant callers (Fig. S2).