METHODOLOGY
Two multiplex families (STU-65 and STU-66) were recruited from database and clinically characterized for stuttering by speech pathologist using Stuttering Severity Instrument-334. Using a structured questionnaire family history and other details were noted. Eight milliliters of blood was collected from probands and their family members. Genomic DNA was isolated using PCI extraction method35 and quantified by Nano drop (Thermo- Fisher Scientific, Wilmington, USA). About 500ng of DNA per sample was quantified and 20 uL volume of each sample was sent to Medgenome Labs Ltd., Bengaluru for comprehensive exome sequencing. ES was performed in six individuals; four individuals from STU-66 family and two affected siblings from STU-65 family.
STU-66 family: A Telugu speaking family showed seven affected across three generations with three inbreeding loops in the pedigree (figure 1). The proband (V-5) was born to non-consanguineous parents, where the father (IV-2) had mild stuttering but mother (IV-3) was unaffected. Both the proband (V-5) and his younger brother (V-6) had moderate but progressive stuttering, with age at onset of 2.5 years; his elder brother (V-4) was unaffected. His grandmother (recovered; III-1), uncle (mild stuttering; IV-7) and two first cousins (severe; V-2,V-3) were also affected with stuttering. The nuclear family comprising of affected father (IV-2), unaffected mother (IV-3), unaffected brother (V-4) and the proband (V-5) were selected for ES.
STU-65 family: A Tamil speaking family from a major endogamous Mudaliar caste had more than 20 PWS across six generations. Availability of senior informants starting from third generation helped to trace intense multigenerational inbreeding, tracing the phenotype to a common female founder I-2 (figure 2).
The proband (V-33) and his affected brother (V-35) were born to consanguineous parents (III-21,IV-8) who were also affected with stuttering. Though his mother complained of severe stuttering in the early days, her dysfluencies have reduced now but with a reminiscence of fast speech rate and jaw clenching. Father also had repetitions and jaw clenching.
Both the proband (V-33) and his younger brother (V-35) developed stuttering gradually at 2.5 years and were moderate and severe respectively with no birth complications. The proband was a dropout from school and had situational increase, with strangers. Dysfluencies observed include hard contacts in initial syllable, prolongation, silent pauses, syllable and part-word repetitions with iterations of 2-3. Secondary behaviours included eye blinks, clicking sounds, fixed articulatory posture, nose flaring, tension in the neck, jaw jerking and frequent left side head nod. His rate of speech was slow and intelligibility in speech was fair. His brother had repetitions and prolongations along with eye blinking, facial grimace, hand fidgeting, etc. He also had situational increase but continued beyond school education. Their grandmother (mild), maternal aunts and cousins were also affected with severe stuttering. The extended family members have been extensively phenotyped (table A4).
ES was commercially carried out at MedGenome Labs Ltd., Bangalore facility. ES library was prepared using Agilent-Sure Select XT Reagent Kit (Illumina). Biotinylated oligonucleotide capture probes (V5+UTR) designed for all the coding exons were used to enrich the region of interest (exome) by hybridization. The library obtained was diluted to final concentration of 2nm in 10ul and subjected to Cluster amplification. The flow cell was loaded on to the sequencer (Hi Seq X10) to generate 2X150 bp sequence reads at 100x sequencing depth. Sequenced data with Q30 values was considered as qualified and processed to generate FASTQ files for further downstream variant analysis.
The raw data was quality trimmed and reads (using fastq-mcf command line tool) were aligned to Human Reference genome (alignment to hg19) using BWA-MEM tool. The output in SAM format is converted to BAM file (using Samtools) and processed to obtain SNVs (Single Nucleotide Variation) and INDELs (small insertions and deletions) in a standard VCF (Variant call format) file. Coverage of the genes were analyzed using Bedtools.
The variants were called using GATK software and annotated using MedGenome in-house variant annotation pipeline (VariMAT - Variation and Mutation Annotation Toolkit). It integrates multiple clinical grade databases [GWAS (https://www.genome.gov/genetics-glossary/Genome-Wide-Association-Studies), ClinVar (https://www.ncbi.nlm.nih.gov/clinvar), OMIM (https://www.omim.org), UniProt (https://www.uniprot.org/), ExAC (http://exac. broadinstitute.org; https://gnomad.broadinstitute.org), dbSNP (https://www.ncbi.nlm. nih.gov/snp/), 1000 genomes (https://www.coriell.org/1/NHGRI/ Collections/1000-Genomes-Collections/1000-Genomes-Project)], variant class prediction and variants pathogenicity prediction tools for annotating the variants. VariMAT annotated variants contain information on the population frequency, computational pathogenicity prediction, variant type and predicted impact of the variant on the protein (missense, loss of function, etc.).
Sequencing data of STU-66 family: The paired-end ES, generated a data of 10-14 Gb for each of the four individuals sequenced. More than 93% of the data showed variant quality scores above Q30 (Q score measures the base calling accuracy by estimating base calling error probabilities and Q30 indicates the probability of incorrect base call of 1 in 1000 which means the base call accuracy is 99.9%). The overall alignment and the passed alignment percentage in all the samples was around 99.99 and 97.28 % respectively. The analysis was performed after alignment using SS-V5-UTR panel (74,557,381bp) which covers 23690 genes. The average panel depth for each of the sample ranged from 80.42 to 99.27X.
Sequencing data of STU-65 family: In this family the proband and his affected brother, V-33 and V-35, were sequenced generating total data of 8-11 Gb. More than 90% of the data showed quality score distribution above Q30. The overall alignment and passed alignment percentage (alignment to hg19) in proband and sib was around 99.99 and 94.89 % respectively. The average panel depth for each sample ranges from 80 to 100 X.
Variant filtering: The downstream variant analysis protocol involves short listing of the variants in the affected individuals, followed by prioritization of variants based on the data from additional family members. Each of the individual variant files were subjected to variant filtering criteria, to profile rare variants in all samples under investigation, comprising of:
  1. The high impact variant type that includes frameshift, termination, start loss type,
  2. The moderate impact variants being missense, stop loss and indels, and
  3. Intronic splice site variants comprise of splice donor/acceptor or proximal splicing impact variants. These variants are filtered with a cutoff for depth being 3X and MAF <1% in ExAC, 1000 genomes, and proprietary Medvar (MedGenome variation) databases.
Variant prioritization: The variant prioritization and candidate gene identification in family based cohort often rely on profiling the common and uncommon gene variants present in the affected and unaffected members in a family. An open source tool called InteractiVenn35 was deployed for this purpose that can handle up to six data sets (creates Venn diagram). To this resulting variant list the following exclusion criteria were employed.