Sequencing analysis of NTD samples
The WGS data were processed using standard pipelines, as described in
the Broad Institute’s GATK Best Practices (Van der Auwera et al., 2013).
Reads were aligned to the hg38 reference provided as part of the GATK
Bundle using BWA (Li & Durbin, 2009). Variant calling was performed
with GATK4 (Poplin et al., 2018) and joint genotyping was carried out on
the whole cohort, followed by Variant Quality Score Recalibration
(VQSR). Quality control (following standard practices such as obtaining
sequencing metrics, per sample missing rate and level of
heterozygosity), was done to check for DNA contamination and identify
outliers, removing those samples with poor quality. Per-variant quality
was also assessed and only variants with a “PASS” in the filter column
were retained and annotated utilizing Ensembl Variant Effect Predictor
(VEP) v.95 (McLaren et al., 2016). GnomAD
(https://gnomad.broadinstitute.org/ ) database v2.1.1 was used as
a reference to determine whether the variant is novel (allele frequency
(AF) = 0) or is rare (AF < 0.001). Pathogenic effect
prediction of all missense variants were performed using the online
program SIFT (Sorting Intolerant From Tolerant;https://sift.bii.a‐star.edu.sg ). All parameters were set as per
the software’s default settings. The localization of the variants in
their protein domains was assessed by Uniprot
(http://www.uniprot.org/ ). Gene lollipop structure was plotted
using Lollipops program (Jay & Brouwer, 2016).
All eight CIC variants passed Bamfile checking (IGV) and were then
validated by Sanger Sequencing. Variants lollipop plot was generated
using the Lollipops software (Rothenberg et al., 2004).