Identification of expansin domains in actinobacterial genomes
Five actinobacterial genomes were selected to show the application of
the ExED for the identification of expansin domains. An Illumina MiSeq
sequencer was used to sequence the genomes (NGS facility, University of
the Western Cape, South Africa). Due to the high G+C content of
actinobacterial DNA, a 10% PhiX spike was included in the run. The
genomes were assembled using the A5-miseq pipeline53.
The two newly created profile HMMs mentioned above were applied to
search the five actinobacterial genomes for the occurrence of expansin
domains. Nucleic acid sequences were translated using the default codon
usage table available in the transeq tool from the EMBOSS
software suite (version 6.6.054). Translated amino
acid sequences with less than 60 subsequent amino acid symbols were
discarded to reduce computation time.
The hmmscan tool from the HMMER software suite (version 3.1b2,
http://www.hmmer.org, Howard Hughes Medical Institute, Chevy Chase, MD,
USA) was used to scan the translated amino acid sequences with profile
HMMs. The hits from hmmscan were filtered by a minimal
domain-based score of 35 and a minimal coverage of 75% (defined as the
ratio of hit length without insertions divided by the length of the
profile HMM).
The matches for the profile HMMs of expansin domains were extended to
find the adjacent start methionine and stop codon along the contig
sequence of each match. The first or last available amino acid position
in a contig was used to extend the hits, in case of a missing start or
stop codon, respectively. The extended hit sequences are available for
download under https://doi.org/10.18419/darus-699.