Sequence space of expansin domains
Two profile HMMs for the N-terminal and the C-terminal expansin domains were derived and used for annotation of the two domains in all 12,404 classified sequences of the ExED (superfamilies 1, 2, 3, and 4), independent of their sequence lengths. For the superfamilies ‘Bacterial expansins’, ‘Fungal expansins’, and ‘Plant expansins’, the N- and the C-terminal expansin domains could be annotated in 9,470 out of 9,984 sequences and in 8,896 out of 9,984 sequences, respectively (Table S5 ). In 2,182 out of the 2,420 sequences from the superfamily ‘N-terminal domains’, only the N-terminal expansin domain was annotated.
Based on the annotated domains in the classified superfamilies, two protein sequence networks were generated. The sequence network of N-terminal expansin domains is dominated by three large clusters (Figure S6 ): Homologues of cluster A classified as EXPA (Hfam 9-20), homologues of cluster B as EXPB (Hfam 21, 22), and homologues of cluster C as EXLX as well as fungal sequences (Hfam 3, 4, 8). These clusters are supplemented by clusters D (Hfam 24, 25; EXLB), E (Hfam 26;Magnoliophyta A), F (Hfam 23; EXLA), G (Hfam 7; Fungi), H (Hfam 27; Magnoliophyta B), and cluster I comprising N-terminal domains from different sources (Hfam 8, 11, 31, 32; Fungi, EXPA,Basidiomycota , Loosenin). The N-terminal domains ofMagnoliophyta B, Actinobacteria , and Oomycetes form separate clusters. The sequences of CBM63 are within clusters of homologous families 3 and 4 from the superfamily ‘Bacterial sequences’.
The sequence network of the C-terminal expansin domain is dominated by six large clusters from ‘Plant expansins’, previously annotated as EXPA, EXPB, EXLB, and EXLA (clusters A-C and E-G), one cluster from ‘Fungal expansins’ (D, Hfam 7), and three clusters from ‘Bacterial expansins’ (H-J, Hfams 1, 3, 4, 6) (Figure S7 ). In each of the two domain-based networks, one bacterial sequence was found in a cluster from ‘Plant expansins’, Streptomyces acidiscabies (NCBI accession GAQ55178.1) in EXPA (Figure S6 ), and Soehngenia saccharolytica (NCBI accession TJX44964.1) in EXLA (Figure S7 ).
The N- and C-terminal expansin domains have not evolved independently, but have co-evolved, as indicated by the correlation of sequence similarities of the two domains to the respective profile HMM (Figure 3 ). The shift in respect to the diagonal indicates a higher conservation for the N-terminal expansin domain than for the C-terminal expansin domain.