Figure 6. (A) Target T1145 as split into two EUs.(B) Grishin plots for four original domains of T1145 as marked
in panel A. The upper left panel in section (B) shows that domains 1 and
2 should be split, while domains 2, 3 and 4 (the remaining 3 panels)
should be joined.
The last example is target T1169, a mosquito salivary protein SGS1
involved in mosquito-borne diseases28. It is the largest
monomeric target in the history of CASP (3364 residues in the sequence;
2735 residues resolved in the structure). It has a cocoon-shaped
structure with multiple domains and extensive inter-domain interactions
(Figure 7), thus presenting a significant challenge in defining EUs. The
top-ranked SWORD/SWORD2 splitting schema suggested 7 domains; the domain
definition from the authors (Figure 7B28) and the results of
HHsearch homology searches (Figure 7C) offered additional help in
defining domains. Domains were originally defined so that the following
7 areas were separated: the N-term β-propeller (blue in panel A, orange
in panel B), region between the two β-propellers (HHsearch), β-propeller
2, region after the beta-propeller, CBM domain, lectin-CRD domain, the
area containing the wedge domain up to the TM domain (HHsearch). The
Grishin plot analysis suggested merging of two domains surrounding
β-propeller 2, and merging of CBM, lectin-CRD and wedge-containing
domains. In the end we split T1169 into four evaluation units, as
colored in Figure 7A. A long linker between D1 and D4 and orphan helices
in the middle of the cocoon (grey) were not assigned to any of the EUs.