Figure Legends
FIGURE 1 Small Open Reading Frames (sORFs) and RNA. Box: Within mRNA that encodes canonical protein coding sequences (CDS), sORFs can appear in the 5′ UTR (upstream ORF, uORF), initiating in the 5′ UTR and extending into the CDS in an alternative reading frame (upstream overlapping ORF, u.oORF), in the 3′ UTR (downstream ORF, dORF), or nested within the CDS in an alternative reading frame. sORFs can also be found in long noncoding RNA (lncRNA, bottom) and circular RNA (circRNA, right), as well as additional classes of RNA not pictured.
FIGURE 2 Alternative Reading Frames for Same-Strand Overlapping (Nested) sORFs. The +1 reading frame corresponds to the canonical coding sequence and is always the frame of reference. Frameshifted translation in the +2 or +3 reading frames generates protein products with completely different amino acid sequences because the codon identities are changed in alternative reading frames.
FIGURE 3 Mass Spectrometry Workflow for Detection of Unannotated Microproteins. To search for novel microproteins in a sample of interest, low molecular weight proteins are isolated from total protein after cell lysis. Size-exclusion techniques include, but are not limited to, solid-phase extraction and polyacrylamide gel electrophoresis techniques. Low molecular weight protein is digested with a protease, producing a sample of uniform peptide length appropriate for mass spectrometric (MS) analysis. Experimental spectra are generated and matched to theoretical spectra from a custom database using proteomics software. Detection of annotated microproteins known to be expressed in the system of interest can serve as a positive control for success of small protein enrichment and known small proteome coverage, but these spectra are otherwise computationally excluded. Peptides deriving from proteolysis of canonical proteins before size-exclusion are computationally identified and excluded from consideration. High scoring experimental spectra without any matches to known microproteins can be subjected to further molecular validation, leading to annotation of novel microproteins.
FIGURE 4 Experimentally Determined Microprotein Structures. (A) Crystal structure of AcrB (grayscale) of the tolC efflux pump in complex with microprotein AcrZ (cyan). PDB: 5NC5. (B) Cryo-EM structure of bacterial microprotein CydX (cyan) in complex with transmembrane cytochrome bd-I oxidase (grayscale). PDB: 6RKO. (C) Crystal structure of SERCA1a calcium pump (grayscale) with bound single-pass transmembrane microprotein phospholamban (cyan), which downregulates SERCA activity. PDB: 4Y3U. Solid-state NMR structure of helix-loop-helix microprotein DWORF (cyan) modeled into SERCA1a calcium pump (grayscale) based on Venkateswara et al. 2022. PDB: 4Y3U, 7MPA. (D) NMR structure of wild-type humanin in 30% 2,2,2-trifluoroethanol (organic) solution. PBD: 1Y32. (E) Crystal structure of Ubiquitin monomer. PDB: 1AAR. (F) Crystal Structure of ubiquitin-like TINCR microprotein with additional N-terminal alpha helix. PDB: 7MRJ. (G) Predicted structure of bacterial microprotein YmcF generated with AlphaFold, obtained from UniProt[166] (green). Five cysteines (orange) in the YmcF sequence are predicted to form a zinc-finger domain common to RNA binding proteins. (H) Predicted structure of PAQosome binding microprotein ASDURF generated with AlphaFold, obtained from UniProt[166].