Annotation of expansin domains in actinobacterial genomes
As a case study for the application of ExED in genome sequence annotation, actinobacterial genomes from various South African habitats were analyzed for the presence of expansin domains and conserved amino acid positions, using the profile HMMs of the expansin domains (Tables S8 and S9 ). In general, the sequence regions identified for the N-terminal expansin domains emerged with higher HMMER scores, whereas the C-terminal domains seemed less conserved (compare with Figure 3 ). Despite the lower scores for the C-terminal expansin domain, the coverage for the underlying profile HMM was still high (90%). One genome hit was identified in sediment samples collected at Gamka River in the Swartberg Mountain Range, which was identical to an expansin homologue from Streptomyces swartbergensis (NCBI accession WP_086602418), which matched well the profile HMM of the N-terminal expansin domain (score: 60, 98% coverage) and moderately the profile HMM of the C-terminal expansin domain (score: 19, 89% coverage). The sequence from S. swartbergensis contains amino acids that are conserved in the superfamily ‘Bacterial expansins’ (threonine 12, glycine 21, alanine 36, glycine 53, tyrosine 55, proline 74, aspartate 82, leucine 83, phenylalanine 88, and glycine 97 in the N-terminal expansin domain; lysine 119, tryptophan 126, tryptophan 149, tyrosine 157, and glycine 179 in the C-terminal expansin domain) and also amino acids that are conserved in the superfamilies ‘Fungal expansins’ or ‘Plant expansins’ (tyrosine 14, cysteine 23, cysteine 52, and cysteine 73).