The Expansin Engineering Database (ExED)
The current version of the ExED contains 15,089 sequence entries, 12,400
protein entries, and twenty-one protein structures (Tables 1and S5 ), which, based on global sequence similarity, were
assigned to four superfamilies (comprising 12,404 sequence entries, 9954
protein entries and seventeen structures). Three superfamilies include
expansin homologues with two domains and were named according to their
dominant source organisms: superfamily 1 ‘Bacterial expansins’ (1172
sequences, ten structures), superfamily 2 ‘Fungal expansins’ (543
sequences, no structure), and superfamily 3 ‘Plant expansins’ (8269
sequences, six structures). The members of superfamily 4 ‘N-terminal
domains’ consist of the N-terminal expansin domain only (2420 sequences,
one structure). This superfamily comprises eukaryotic and bacterial
sequences, e.g. from Magnoliophyta (A, B, and C),Actinobacteria , Oomycetes , and Basidiomycota . The
remaining number of 2685 sequences (corresponding to 2446 protein
entries) and four structures could not be assigned to the four
superfamilies and was thus collected in an unclassified fifth
superfamily, which was omitted for further investigations.
The sequence lengths in the superfamilies ‘Bacterial expansins’, ‘Fungal
expansins’, and ‘Plant expansins’ vary between 40 and 1400 amino acids
with a sharp peak between 250 and 270 amino acids and two minor peaks at
150 and at 600 amino acids (Figure S4 ). The sequence length
distributions differ for each of the four superfamilies (FigureS5 ). For further analysis of whole expansin sequences and
comparison with expansin-like proteins, only sequences with a length
between 210 and 300 amino acids were considered (7706 sequences from the
superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant
expansins’) (Figure S4 ).
In the protein sequence network, which was built from global sequence
alignments for the superfamilies ‘Bacterial expansins’, ‘Fungal
expansins’, and ‘Plant expansins’, the latter are the most frequent
group forming four large separate clusters, which consist of one or more
homologous families (Hfams): cluster A has been classified as EXPA
(Hfams 9-20), cluster B as EXPB (Hfams 21, 22), cluster C as EXLB (Hfams
24, 25), and cluster D as EXLA (Hfam 23) (Figure 2 ). These four
clusters are followed by two clusters of the superfamily ‘Fungal
expansins’ (Hfam 7) and three clusters of the superfamily ‘Bacterial
expansins’ (Hfams 3, 4; 1, 2, 4, 6; and 3). Noteworthy, the ‘Plant
expansins’ clusters A and D also contain bacterial sequences.
In our study, expansins were found
in Bacteria , Archaea , and Eukaryota . When looking
in detail at the major taxa in the tree of life
(after Fig. 1
in8), expansins occur in Gammaproteobacteria ,Betaproteobacteria , Deltaproteobacteria ,Acidobacteria , Bacteroidetes , Fibrobacteres ,Ignavibacteria , Actinobacteria , Chloroflexi ,Firmicutes , Cyanobacteria (all Bacteria ),Euryarchaeota (Archaea ), Metazoa , Fungi,Evosea , Discosea , Discoba , Embryophyta ,Chloroplastida , Rhodophyta , and Stramenopiles (allEukaryota ) (Table S6 ).