Identification of expansin domains in actinobacterial genomes
Five actinobacterial genomes were selected to show the application of the ExED for the identification of expansin domains. An Illumina MiSeq sequencer was used to sequence the genomes (NGS facility, University of the Western Cape, South Africa). Due to the high G+C content of actinobacterial DNA, a 10% PhiX spike was included in the run. The genomes were assembled using the A5-miseq pipeline53.
The two newly created profile HMMs mentioned above were applied to search the five actinobacterial genomes for the occurrence of expansin domains. Nucleic acid sequences were translated using the default codon usage table available in the transeq tool from the EMBOSS software suite (version 6.6.054). Translated amino acid sequences with less than 60 subsequent amino acid symbols were discarded to reduce computation time.
The hmmscan tool from the HMMER software suite (version 3.1b2, http://www.hmmer.org, Howard Hughes Medical Institute, Chevy Chase, MD, USA) was used to scan the translated amino acid sequences with profile HMMs. The hits from hmmscan were filtered by a minimal domain-based score of 35 and a minimal coverage of 75% (defined as the ratio of hit length without insertions divided by the length of the profile HMM).
The matches for the profile HMMs of expansin domains were extended to find the adjacent start methionine and stop codon along the contig sequence of each match. The first or last available amino acid position in a contig was used to extend the hits, in case of a missing start or stop codon, respectively. The extended hit sequences are available for download under https://doi.org/10.18419/darus-699.