Results
Whole exome sequencing (WES) of 77 infertile males and their unaffected
parents identified 109 rare de novo mutations (DNMs), all of
which were independently validated by Sanger sequencing. Accurate
phasing and parent-of-origin calling of the DNMs requires DNA molecules
spanning a parentally informative single nucleotide polymorphism (iSNP)
and the DNM (Figure 1.a). As such, the ability to call the
parent-of-origin is primarily reliant on read lengths. This is notable
in the WES data, where only 8% of DNMs could be phased as most iSNP
were located >300 bp away from the DNM (Figure 1.b1 and
Table 1).
Target amplification groups
For phasing, all DNM regions were targeted with long-range PCR and
sequenced using ONT long-read sequencing. Primer pairs were designed for
targeted long-read phasing of the 109 targets (Supplementary Table 1).
The fragment size was capped at ~10 kb to simplify PCR
optimisation. Unfortunately, amplification success was still impacted by
target length within the 10 kb range, with ideal lengths found to be
<4kb (Supplementary Figure 2.a).
PCR optimisation steps were
required for 50% of all targets (Supplementary Figure 2.b). The
amplification size had no impact on error or quality in base calls or
allele assignment (Supplementary Figure 3 and Supplementary Table 7).
Importantly, for 71% of cases an iSNP was identified in the available
trio-based WES data within 10 kb of the DNM (Figure 1.b1 & 1.b2). For
this group of DNMs, phasing can be done by targeted long-read sequencing
of the proband only, since the iSNP is already typed in patients and
their parents.
Cases where an iSNP could not be found in the coding region had primers
designed to cover 5 kb regions around the DNM position for parent and
proband samples (Figure 1.b3). We chose the 5 kb region based on the
analysis of 4344 DNMs identified in 53 whole genome sequenced
individuals/children (Smits et al., 2022). This revealed the presence of
at least 1 iSNP within 5 kb of any given DNM in 81% of cases
(Supplementary Tables 8 and 9). In the end, we obtained long-read
sequencing data with iSNPs for 77 out of 109 DNMs selected (71%,
Supplementary Table 8 and 10).