Homologous expansin-like domains in other proteins
The GH45 protein sequences show conserved positions, which are also highly conserved in the superfamilies ‘Plant expansins’ and ‘Fungal expansins’: the EXPA/EXPB motif HFDL (80-83), glycine 21, cysteine 23 and glycine 24 of the plant motif GGACGYG (21-26), threonine 12 and tyrosine 14 of the plant and fungal motif T(F/W)YG (12-14 and 14.1), and alanine 36 of the fungal motif GTAnS (34-38) (Figure S8 ). Thus, on a local sequence level, GH45 endoglucanases are more similar to the N-terminal expansin domain than expected from their different global protein sequences (Figure 4 ).
For further comparison, 582 protein sequences of the carbohydrate-binding module family 63 (CBM63) with a sequence length between 57 and 746 amino acids were downloaded from the CAZy database46. Interestingly, 511 of these sequences contained both expansin domains and were therefore already annotated in the ExED in the superfamilies ‘Bacterial expansins’ and ‘Fungal expansins’. Four CBM63 sequences contained only the C-terminal expansin domain, whereas 58 CBM63 sequences contained only the N-terminal expansin domain and shared a sequence identity of over 60% with N-terminal expansin domains of the superfamily ‘Bacterial expansins’ (Figure S6 ). A protein sequence network including the whole CBM63 sequences and expansin sequences from the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’ revealed the similarity of CBM63 sequences to ‘Bacterial expansins’ and also to ‘Fungal expansins’ from homologous family 7 (Figure 5) .
The members of the superfamily ‘N-terminal domains’ consist of the N-terminal expansin domain only. Similarly, loosenin (NCBI ADI72050.2), EXPN from Endogone sp. FLAS-F59071 (NCBI accession RUS20349.1), the expansin-like protein found in nematodeHeterodera glycines (NCBI ADL29728.1), and cerato-platanin from Ceratocystis platani (NCBI accession CAC84090.2) consist only of the N-terminal expansin domain (Table S7 ). At a threshold of 60% sequence identity, the N-terminal domains of loosenin and Basidiomycota cluster with fungal sequences from Hfam 7 and plant sequences from Hfam 11 (Figure S6 ). In contrast, swollenin was found to possess only a distantly related C-terminal expansin domain (Table S7 ).