Sequence space of expansin domains
Two profile HMMs for the N-terminal and the C-terminal expansin domains
were derived and used for annotation of the two domains in all 12,404
classified sequences of the ExED (superfamilies 1, 2, 3, and 4),
independent of their sequence lengths. For the superfamilies ‘Bacterial
expansins’, ‘Fungal expansins’, and ‘Plant expansins’, the N- and the
C-terminal expansin domains could be annotated in 9,470 out of 9,984
sequences and in 8,896 out of 9,984 sequences, respectively
(Table S5 ). In 2,182 out of the 2,420 sequences from the
superfamily ‘N-terminal domains’, only the N-terminal expansin domain
was annotated.
Based on the annotated domains in the classified superfamilies, two
protein sequence networks were generated. The sequence network of
N-terminal expansin domains is dominated by three large clusters
(Figure S6 ): Homologues of cluster A classified as EXPA (Hfam
9-20), homologues of cluster B as EXPB (Hfam 21, 22), and homologues of
cluster C as EXLX as well as fungal sequences (Hfam 3, 4, 8). These
clusters are supplemented by clusters D (Hfam 24, 25; EXLB), E (Hfam 26;Magnoliophyta A), F (Hfam 23; EXLA), G (Hfam 7; Fungi), H (Hfam
27; Magnoliophyta B), and cluster I comprising N-terminal domains
from different sources (Hfam 8, 11, 31, 32; Fungi, EXPA,Basidiomycota , Loosenin). The N-terminal domains ofMagnoliophyta B, Actinobacteria , and Oomycetes form
separate clusters. The sequences of CBM63 are within clusters of
homologous families 3 and 4 from the superfamily ‘Bacterial sequences’.
The sequence network of the C-terminal expansin domain is dominated by
six large clusters from ‘Plant expansins’, previously annotated as EXPA,
EXPB, EXLB, and EXLA (clusters A-C and E-G), one cluster from ‘Fungal
expansins’ (D, Hfam 7), and three clusters from ‘Bacterial expansins’
(H-J, Hfams 1, 3, 4, 6) (Figure S7 ). In each of the two
domain-based networks, one bacterial sequence was found in a cluster
from ‘Plant expansins’, Streptomyces acidiscabies (NCBI accession
GAQ55178.1) in EXPA (Figure S6 ), and Soehngenia
saccharolytica (NCBI accession TJX44964.1) in EXLA (Figure
S7 ).
The N- and C-terminal expansin domains have not evolved independently,
but have co-evolved, as indicated by the correlation of sequence
similarities of the two domains to the respective profile HMM
(Figure 3 ). The shift in respect to the diagonal indicates a
higher conservation for the N-terminal expansin domain than for the
C-terminal expansin domain.