Figure legends
Figure 1 Functionally relevant positions in the expansin
domains from the representative protein structure of Bacillus
subtilis expansin Bs EXLX1 (PDB entry 4fer, chain B) are labelled
with standard position numbers (numbering according
to13) and shown as sticks. The substrate cellohexaose
is depicted above the C-terminal expansin domain in green.
Figure 2 Protein sequence network showing the sequence space of
the expansin sequences in the ExED belonging to the superfamilies
‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’. All
protein sequences presented in this network have a sequence length
between 210 and 300 amino acids (Figure S4 ). The threshold for
the nodes is 90% sequence identity (clustered with USEARCH) and the
threshold for the edges is 50% pairwise sequence identity (determined
by Needleman-Wunsch alignments). This network consists of 3504 nodes and
1,036,745 edges. With respect to the taxonomic lineages, the nodes fromBacteria , Fungi, Viridiplantae , and other origin are
colored in red, orange, green, and white, respectively. The protein
sequences of the fifteen biggest clusters belong to the following
homologous families (Hfams) and expansin classifications: A (Hfams 9-20;
expansin classification EXPA), B (21-22; EXPB), C (24-25; EXLB), D (23;
EXLA), E (7, Fungi), F (3, 4; EXLX), G (1, 2, 4, 6; EXLX), H (3; EXLX),
and I (7, Fungi). The bacterial sequences (red) in clusters A and D
belong to Streptomyces acidiscabies (NCBI accession WP
050370046.1), Kutzneria sp. 744 (NCBI accession EWM10128.1, both
Hfam 5, cluster A), and Soehngenia saccharolyta (NCBI accession
TJX44964.1, cluster D).
Figure 3 Bivariate histogram of co-occurring HMMER bit scores
of the N- and C-terminal expansin domains. The greyscale bar represents
the relative frequency of the bit scores. The black diagonal line is the
bisecting line.
Figure 4 Protein sequence network showing the protein sequence
space of GH45 sequences from Pfam45 (accession
PF02015) and the sequence regions annotated as N-terminal expansin
domains from the superfamilies ‘Bacterial expansins’, ‘Fungal
expansins’, ‘Plant expansins’, and ‘N-terminal domains’. The colors
representing the origin of the expansin sequences correspond to the
scheme in Figure 2 with GH45 sequences colored in blue. The
threshold for the nodes is 90% sequence identity (clustered with
USEARCH) and the threshold for the edges is 30% pairwise sequence
identity (determined by Needleman-Wunsch alignments). This network
consists of 4,031 nodes and 2,182,810 edges.
Figure 5 Protein sequence network showing the protein sequence
space of CBM63 sequences from CAZy and the protein sequences of the
superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant
expansins’ with a sequence length between 210 and 300 amino acids
(Figure S4 ). In contrast to the four big clusters from ‘Plant
expansins’ (EXPA (A), EXPB (B), EXLA (C), and EXLB (D)), where no CBM63
sequences can be found, the clusters from the superfamilies ‘Bacterial
expansins’ and ‘Fungal expansins’ show many connections to sequences of
CBM63. The colors representing the origin of the expansin sequences
correspond to the scheme in Figure 2 with CBM63 sequences
colored in cyan. The threshold for the nodes is 90% sequence identity
(clustered with USEARCH) and the threshold for the edges is 50%
pairwise sequence identity (determined by Needleman-Wunsch alignments).
This network consists of 3,344 nodes and 844,280 edges.