METHODOLOGY
Two multiplex families (STU-65 and STU-66) were recruited from database
and clinically characterized for stuttering by speech pathologist using
Stuttering Severity Instrument-334. Using a structured
questionnaire family history and other details were noted. Eight
milliliters of blood was collected from probands and their family
members. Genomic DNA was isolated using PCI extraction
method35 and quantified by Nano drop (Thermo- Fisher
Scientific, Wilmington, USA). About 500ng of DNA per sample was
quantified and 20 uL volume of each sample was sent to Medgenome Labs
Ltd., Bengaluru for comprehensive exome sequencing. ES was performed in
six individuals; four individuals from STU-66 family and two affected
siblings from STU-65 family.
STU-66 family: A Telugu speaking family showed seven affected
across three generations with three inbreeding loops in the pedigree
(figure 1). The proband (V-5) was born to non-consanguineous parents,
where the father (IV-2) had mild stuttering but mother (IV-3) was
unaffected. Both the proband (V-5) and his younger brother (V-6) had
moderate but progressive stuttering, with age at onset of 2.5 years; his
elder brother (V-4) was unaffected. His grandmother (recovered; III-1),
uncle (mild stuttering; IV-7) and two first cousins (severe; V-2,V-3)
were also affected with stuttering. The nuclear family comprising of
affected father (IV-2), unaffected mother (IV-3), unaffected brother
(V-4) and the proband (V-5) were selected for ES.
STU-65 family: A Tamil speaking family from a major endogamous
Mudaliar caste had more than 20 PWS across six generations. Availability
of senior informants starting from third generation helped to trace
intense multigenerational inbreeding, tracing the phenotype to a common
female founder I-2 (figure 2).
The proband (V-33) and his affected brother (V-35) were born to
consanguineous parents (III-21,IV-8) who were also affected with
stuttering. Though his mother complained of severe stuttering in the
early days, her dysfluencies have reduced now but with a reminiscence of
fast speech rate and jaw clenching. Father also had repetitions and jaw
clenching.
Both the proband (V-33) and his younger brother (V-35) developed
stuttering gradually at 2.5 years and were moderate and severe
respectively with no birth complications. The proband was a dropout from
school and had situational increase, with strangers. Dysfluencies
observed include hard contacts in initial syllable, prolongation, silent
pauses, syllable and part-word repetitions with iterations of 2-3.
Secondary behaviours included eye blinks, clicking sounds, fixed
articulatory posture, nose flaring, tension in the neck, jaw jerking and
frequent left side head nod. His rate of speech was slow and
intelligibility in speech was fair. His brother had repetitions and
prolongations along with eye blinking, facial grimace, hand fidgeting,
etc. He also had situational increase but continued beyond school
education. Their grandmother (mild), maternal aunts and cousins were
also affected with severe stuttering. The extended family members have
been extensively phenotyped (table A4).
ES was commercially carried out at MedGenome Labs Ltd., Bangalore
facility. ES library was prepared using Agilent-Sure Select XT Reagent
Kit (Illumina). Biotinylated oligonucleotide capture probes (V5+UTR)
designed for all the coding exons were used to enrich the region of
interest (exome) by hybridization. The library obtained was diluted to
final concentration of 2nm in 10ul and subjected to Cluster
amplification. The flow cell was loaded on to the sequencer (Hi Seq X10)
to generate 2X150 bp sequence reads at 100x sequencing depth. Sequenced
data with Q30 values was considered as qualified and processed to
generate FASTQ files for further downstream variant analysis.
The raw data was quality trimmed and reads (using fastq-mcf command line
tool) were aligned to Human Reference genome (alignment to hg19) using
BWA-MEM tool. The output in SAM format is converted to BAM file (using
Samtools) and processed to obtain SNVs (Single Nucleotide Variation) and
INDELs (small insertions and deletions) in a standard VCF (Variant call
format) file. Coverage of the genes were analyzed using Bedtools.
The variants were called using GATK software and annotated using
MedGenome in-house variant annotation pipeline (VariMAT - Variation and
Mutation Annotation Toolkit). It integrates multiple clinical grade
databases [GWAS
(https://www.genome.gov/genetics-glossary/Genome-Wide-Association-Studies),
ClinVar (https://www.ncbi.nlm.nih.gov/clinvar), OMIM
(https://www.omim.org), UniProt (https://www.uniprot.org/), ExAC
(http://exac. broadinstitute.org; https://gnomad.broadinstitute.org),
dbSNP (https://www.ncbi.nlm. nih.gov/snp/), 1000 genomes
(https://www.coriell.org/1/NHGRI/
Collections/1000-Genomes-Collections/1000-Genomes-Project)], variant
class prediction and variants pathogenicity prediction tools for
annotating the variants. VariMAT annotated variants contain information
on the population frequency, computational pathogenicity prediction,
variant type and predicted impact of the variant on the protein
(missense, loss of function, etc.).
Sequencing data of STU-66 family: The paired-end ES, generated
a data of 10-14 Gb for each of the four individuals sequenced. More than
93% of the data showed variant quality scores above Q30 (Q score
measures the base calling accuracy by estimating base calling error
probabilities and Q30 indicates the probability of incorrect base call
of 1 in 1000 which means the base call accuracy is 99.9%). The overall
alignment and the passed alignment percentage in all the samples was
around 99.99 and 97.28 % respectively. The analysis was performed after
alignment using SS-V5-UTR panel (74,557,381bp) which covers 23690 genes.
The average panel depth for each of the sample ranged from 80.42 to
99.27X.
Sequencing data of STU-65 family: In this family the proband
and his affected brother, V-33 and V-35, were sequenced generating total
data of 8-11 Gb. More than 90% of the data showed quality score
distribution above Q30. The overall alignment and passed alignment
percentage (alignment to hg19) in proband and sib was around 99.99 and
94.89 % respectively. The average panel depth for each sample ranges
from 80 to 100 X.
Variant filtering: The downstream variant analysis protocol
involves short listing of the variants in the affected individuals,
followed by prioritization of variants based on the data from additional
family members. Each of the individual variant files were subjected to
variant filtering criteria, to profile rare variants in all
samples under investigation, comprising of:
- The high impact variant type that includes frameshift, termination,
start loss type,
- The moderate impact variants being missense, stop loss and indels, and
- Intronic splice site variants comprise of splice donor/acceptor or
proximal splicing impact variants.
These variants are filtered with a cutoff for depth being 3X and MAF
<1% in ExAC, 1000 genomes, and proprietary Medvar
(MedGenome variation) databases.
Variant prioritization: The variant prioritization and
candidate gene identification in family based cohort often rely on
profiling the common and uncommon gene variants present in the affected
and unaffected members in a family. An open source tool called
InteractiVenn35 was deployed for this purpose that can
handle up to six data sets (creates Venn diagram). To this resulting
variant list the following exclusion criteria were employed.