Introduction
Expansins are plant cell wall loosening proteins without apparent catalytic activity, which have been identified in a broad range of organisms1–4. The loosening mechanism is still elusive, but it has been suggested that the non-covalent interactions between cellulose microfibrils are weakened and moved against each other, thus the tight cellulosic structure is loosened1. The interactions between expansins and the plant cell wall, which consists of lignin, hemicellulose, and cellulose, require further investigation5. Expansins were first discovered in plants and were described as proteins mediating pH-dependent extension and stress relaxation of cell walls6. Based on phylogenetic analysis, it has been proposed that expansins in Bacteria and Fungi resulted from multiple horizontal gene transfers from plants to microbes7, but there is also the possibility that the microbial expansin subfamily evolved first in ancient marine microorganisms, and then diversified into distinct terrestrial plant subfamilies8.
Expansins consist of two tightly packed protein domains, connected by a short linker and preceded by a signal peptide9(Figure 1) . Both expansin domains need to be connected for effective wall extension activity and weakening filter paper10,11. The C-terminal domain of EXLX1 (expansin-like X) from Bacillus subtilis dominates the binding to cellulose and to matrix polysaccharides of cell walls through electrostatic or polar interaction10. The Zea mays β-expansin (Zm EXPB1) primarily binds glucuronoarabinoxylan, the major matrix polysaccharide in grass cell walls, and loosens it12.
Key amino acids in the N-terminal domain of Bacillus subtilisexpansin-like protein 1 (Bs EXLX1) are two threonines at positions 12 and 14, a serine at position 16, two aspartates at positions 71 and 82, a tyrosine at position 73, and a glutamic acid at position 7510, numbered according to13. The threonine at standard position 12 is strongly conserved, but not essential for activity10. The aspartate at position 82 is crucial for activity; the threonine at position 14, the aspartate at position 71, and the tyrosine at position 73 are important for activity; and the serine at position 16 and the glutamic acid at position 75 play moderate roles in wall creep activity10. Three disulfide bridges can be found in the N-terminal domain ofZm EXPB114, and the six participating cysteines are highly conserved in the plant expansin groups, EXPA (expansin A) and EXPB (expansin B)14. An additional highly conserved cysteine pair is considered as a fourth disulfide bridge in plant α-expansins15. In the expansin protein Sc Exlx1 from the Basidiomycete fungus Schizophylum commune , three disulfide bonds are predicted16, whereas there is a lack of disulfide bridges in Bs EXLX113 and many other bacterial expansins.
The N-terminal expansin domain is formed by a six-stranded double-Ѱ β-barrel13 that is shared by several protein superfamilies17, e.g. glycoside hydrolase family 45 (GH45)18,19. The expansin-like proteins found in Fungi such as loosenins, EXPNs, or cerato-platanins are single-domain proteins that resemble the N-terminal domain of expansins20–22.
The C-terminal expansin domain is responsible for the binding to cellulosic material and is formed by two stacked β-sheets with an immunoglobulin-like fold1. The cellulose binding site on the protein surface consists of a linear arrangement of aromatic residues (tyrosines, phenylalanines, and tryptophans)13, which for Bs EXLX1 includes two tryptophans at positions 125 and 126, and a tyrosine at position 15710. A further key amino acid residue required for wall extension activity is a lysine at position 11910. The C-terminal domain of Bs EXLX1 belongs to family 63 of carbohydrate binding modules (CBM63)10, which mediate binding to polysaccharides23,24.
In this paper, we analyzed the similarity between “expansin-like proteins” (such as GH45s, loosenins, swollenins, cerato-platanins, EXPNs, and expansin-like proteins found in nematodes) and expansin domains on sequence level by establishing the Expansin Engineering Database (ExED), which collects characterized and putative expansin homologues. The protein sequences in the ExED were divided into different superfamilies (‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’) according to sequence identity, and not by phylogenetic relationships of expansins, which were analyzed in8. By annotating the two expansin domains and using a continuous standard numbering scheme, conserved sequence motifs of the expansin protein family were identified that could be applied in the screening of genomic data for the identification of novel expansins.