Figure legends
Figure 1 Functionally relevant positions in the expansin domains from the representative protein structure of Bacillus subtilis expansin Bs EXLX1 (PDB entry 4fer, chain B) are labelled with standard position numbers (numbering according to13) and shown as sticks. The substrate cellohexaose is depicted above the C-terminal expansin domain in green.
Figure 2 Protein sequence network showing the sequence space of the expansin sequences in the ExED belonging to the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’. All protein sequences presented in this network have a sequence length between 210 and 300 amino acids (Figure S4 ). The threshold for the nodes is 90% sequence identity (clustered with USEARCH) and the threshold for the edges is 50% pairwise sequence identity (determined by Needleman-Wunsch alignments). This network consists of 3504 nodes and 1,036,745 edges. With respect to the taxonomic lineages, the nodes fromBacteria , Fungi, Viridiplantae , and other origin are colored in red, orange, green, and white, respectively. The protein sequences of the fifteen biggest clusters belong to the following homologous families (Hfams) and expansin classifications: A (Hfams 9-20; expansin classification EXPA), B (21-22; EXPB), C (24-25; EXLB), D (23; EXLA), E (7, Fungi), F (3, 4; EXLX), G (1, 2, 4, 6; EXLX), H (3; EXLX), and I (7, Fungi). The bacterial sequences (red) in clusters A and D belong to Streptomyces acidiscabies (NCBI accession WP 050370046.1), Kutzneria sp. 744 (NCBI accession EWM10128.1, both Hfam 5, cluster A), and Soehngenia saccharolyta (NCBI accession TJX44964.1, cluster D).
Figure 3 Bivariate histogram of co-occurring HMMER bit scores of the N- and C-terminal expansin domains. The greyscale bar represents the relative frequency of the bit scores. The black diagonal line is the bisecting line.
Figure 4 Protein sequence network showing the protein sequence space of GH45 sequences from Pfam45 (accession PF02015) and the sequence regions annotated as N-terminal expansin domains from the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, ‘Plant expansins’, and ‘N-terminal domains’. The colors representing the origin of the expansin sequences correspond to the scheme in Figure 2 with GH45 sequences colored in blue. The threshold for the nodes is 90% sequence identity (clustered with USEARCH) and the threshold for the edges is 30% pairwise sequence identity (determined by Needleman-Wunsch alignments). This network consists of 4,031 nodes and 2,182,810 edges.
Figure 5 Protein sequence network showing the protein sequence space of CBM63 sequences from CAZy and the protein sequences of the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’ with a sequence length between 210 and 300 amino acids (Figure S4 ). In contrast to the four big clusters from ‘Plant expansins’ (EXPA (A), EXPB (B), EXLA (C), and EXLB (D)), where no CBM63 sequences can be found, the clusters from the superfamilies ‘Bacterial expansins’ and ‘Fungal expansins’ show many connections to sequences of CBM63. The colors representing the origin of the expansin sequences correspond to the scheme in Figure 2 with CBM63 sequences colored in cyan. The threshold for the nodes is 90% sequence identity (clustered with USEARCH) and the threshold for the edges is 50% pairwise sequence identity (determined by Needleman-Wunsch alignments). This network consists of 3,344 nodes and 844,280 edges.