Sequence analysis and generation of 3D models of NCL-RBDs
Since only partial structural information for NCL-RBDs is available, robust 3D models of all 4 NCL RBDs (RBD1-4), and RBD3-4 were built; structural information for RBD1-2 in tandem (PDB ID: 2KRR) [39] and for individual RBDs (PDB ID: 1FJ7 & 1FJC, respectively) [42] is available for human NCL. The human NCL sequence available from NCBI database [58] was analyzed for its domain architecture using the programs SMART [59], Pfam [60], Uniprot [61], and Interpro [62] to confirm the domain boundaries of the individual RBDs accurately as well as to identify any potential sequence motifs of relevance. The multiple sequence alignment tool Clustal Omega [63] was used to align the NCL-RBDs with hnRNPA1 RBDs to identify conserved residues. The multiple sequence alignment was visualized using the alignment editor Espript3 [64].
Delineated tandem domain pairs were modeled using both template-based methods (Swissmodel [65] Intfold [66], Phyre2 [67] andab initio modeling approaches (Robetta [68], QUARK[69], and I-TASSER [70]) to generate structural models. To identify high quality models, the constructed models were rigorously evaluated by model verification programs including Verify3D [71], VoroMQA [72], Prosa-web [73], and ProQ3 [74] (Supporting Tables S1 and S2) and correlation of their biophysical and structural properties with experimental observations. Top models were refined using ModRefiner [75] and SCWRL4 [76] and then re-evaluated. ModRefiner first modifies the protein side chain packing by adding atoms and improves the structural quality of reconstructed models by energy minimization procedures. SCWRL4 focuses on side chain refinements to improve the models. Top scoring models were chosen for further analysis (Supporting Tables S1 and S2 ).