Protein sequence networks
Protein sequence networks visualize large sequence datasets as nodes in
an undirected graph with edge weights to derive relationships between
different clusters or communities. The protein sequences in the ExED
were sorted by decreasing sequence length and were subsequently
clustered using the USEARCH algorithm (UCLUST) with a threshold of 90%
sequence identity (without terminal gaps) to determine a reduced set of
centroid sequences (representative sequences)30. For
each centroid sequence, the N- and the C-terminal expansin domains were
annotated by the two profile HMMs with the filter criteria mentioned
above. Pairwise sequence identities between two sequences were derived
from global Needleman-Wunsch alignments as described above and used as
edge weights. Protein sequence networks were generated with edge weights
of pairwise sequence identity, filtered by a pre-defined threshold.
Metadata of the nodes (e.g. the sequence ID) and of the edges (i.e. the
edge weights) were summarized in GraphML files by applying the NetworkX
library in Python (version 1.9) for an automated assignment of node and
edge attributes 41. The GraphML files are available at
https://doi.org/10.18419/darus-624. Protein sequence networks were
visualized with Cytoscape version 3.7.242 using a
prefuse, force-directed layout with respect to the edge weights.
For the networks showing the relationships between CBM63s and expansin
homologues, and between GH45s and the N-terminal expansin domain
homologues, CD-HIT (version 4.7) was used with a clustering threshold of
90% and a word size of 5 (instead of UCLUST)43,44.
The GH45 sequences were downloaded from the protein family database
(Pfam, version 32.0, accession PF02015)45, whereas the
CBM63 sequences were downloaded from the carbohydrate-active enzymes
(CAZy) database on June 3, 201946. In the CAZy
database, 633 individual CBM63 sequences were deposited, but only 582
NCBI accessions were available at the time of writing, as some of the
records were moved or entries were merged. Members of CBM63 were
annotated by the profile HMMs for the two expansin domains
(https://doi.org/10.18419/darus-625).