Homologous expansin-like domains in other proteins
The GH45 protein sequences show conserved positions, which are also
highly conserved in the superfamilies ‘Plant expansins’ and ‘Fungal
expansins’: the EXPA/EXPB motif HFDL (80-83), glycine 21, cysteine 23
and glycine 24 of the plant motif GGACGYG (21-26), threonine 12 and
tyrosine 14 of the plant and fungal motif T(F/W)YG (12-14 and 14.1), and
alanine 36 of the fungal motif GTAnS (34-38) (Figure S8 ). Thus,
on a local sequence level, GH45 endoglucanases are more similar to the
N-terminal expansin domain than expected from their different global
protein sequences (Figure 4 ).
For further comparison, 582 protein sequences of the
carbohydrate-binding module family 63 (CBM63) with a sequence length
between 57 and 746 amino acids were downloaded from the CAZy
database46. Interestingly, 511 of these sequences
contained both expansin domains and were therefore already annotated in
the ExED in the superfamilies ‘Bacterial expansins’ and ‘Fungal
expansins’. Four CBM63 sequences contained only the C-terminal expansin
domain, whereas 58 CBM63 sequences contained only the N-terminal
expansin domain and shared a sequence identity of over 60% with
N-terminal expansin domains of the superfamily ‘Bacterial expansins’
(Figure S6 ). A protein sequence network including the whole
CBM63 sequences and expansin sequences from the superfamilies ‘Bacterial
expansins’, ‘Fungal expansins’, and ‘Plant expansins’ revealed the
similarity of CBM63 sequences to ‘Bacterial expansins’ and also to
‘Fungal expansins’ from homologous family 7 (Figure 5) .
The members of the superfamily ‘N-terminal domains’ consist of the
N-terminal expansin domain only. Similarly, loosenin (NCBI ADI72050.2),
EXPN from Endogone sp. FLAS-F59071 (NCBI accession RUS20349.1),
the expansin-like protein found in nematodeHeterodera glycines (NCBI
ADL29728.1), and cerato-platanin from Ceratocystis platani (NCBI
accession CAC84090.2) consist only of the N-terminal expansin domain
(Table S7 ). At a threshold of 60% sequence identity, the
N-terminal domains of loosenin and Basidiomycota cluster with fungal
sequences from Hfam 7 and plant sequences from Hfam 11 (Figure
S6 ). In contrast, swollenin was found to possess only a distantly
related C-terminal expansin domain (Table S7 ).