The Expansin Engineering Database (ExED)
The current version of the ExED contains 15,089 sequence entries, 12,400 protein entries, and twenty-one protein structures (Tables 1and S5 ), which, based on global sequence similarity, were assigned to four superfamilies (comprising 12,404 sequence entries, 9954 protein entries and seventeen structures). Three superfamilies include expansin homologues with two domains and were named according to their dominant source organisms: superfamily 1 ‘Bacterial expansins’ (1172 sequences, ten structures), superfamily 2 ‘Fungal expansins’ (543 sequences, no structure), and superfamily 3 ‘Plant expansins’ (8269 sequences, six structures). The members of superfamily 4 ‘N-terminal domains’ consist of the N-terminal expansin domain only (2420 sequences, one structure). This superfamily comprises eukaryotic and bacterial sequences, e.g. from Magnoliophyta (A, B, and C),Actinobacteria , Oomycetes , and Basidiomycota . The remaining number of 2685 sequences (corresponding to 2446 protein entries) and four structures could not be assigned to the four superfamilies and was thus collected in an unclassified fifth superfamily, which was omitted for further investigations.
The sequence lengths in the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’ vary between 40 and 1400 amino acids with a sharp peak between 250 and 270 amino acids and two minor peaks at 150 and at 600 amino acids (Figure S4 ). The sequence length distributions differ for each of the four superfamilies (FigureS5 ). For further analysis of whole expansin sequences and comparison with expansin-like proteins, only sequences with a length between 210 and 300 amino acids were considered (7706 sequences from the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’) (Figure S4 ).
In the protein sequence network, which was built from global sequence alignments for the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’, the latter are the most frequent group forming four large separate clusters, which consist of one or more homologous families (Hfams): cluster A has been classified as EXPA (Hfams 9-20), cluster B as EXPB (Hfams 21, 22), cluster C as EXLB (Hfams 24, 25), and cluster D as EXLA (Hfam 23) (Figure 2 ). These four clusters are followed by two clusters of the superfamily ‘Fungal expansins’ (Hfam 7) and three clusters of the superfamily ‘Bacterial expansins’ (Hfams 3, 4; 1, 2, 4, 6; and 3). Noteworthy, the ‘Plant expansins’ clusters A and D also contain bacterial sequences.
In our study, expansins were found in Bacteria , Archaea , and Eukaryota . When looking in detail at the major taxa in the tree of life (after Fig. 1 in8), expansins occur in Gammaproteobacteria ,Betaproteobacteria , Deltaproteobacteria ,Acidobacteria , Bacteroidetes , Fibrobacteres ,Ignavibacteria , Actinobacteria , Chloroflexi ,Firmicutes , Cyanobacteria (all Bacteria ),Euryarchaeota (Archaea ), Metazoa , Fungi,Evosea , Discosea , Discoba , Embryophyta ,Chloroplastida , Rhodophyta , and Stramenopiles (allEukaryota ) (Table S6 ).