Competing Interests: MK, CK, IRS, MTS, and JCJ are compensated by NRx Pharmaceuticals, Inc. Lavin Statistical Associates is paid of independent statistical analysis by NRx Pharmaceuticals, Inc.IntroductionClinician-administered rating scales are a universal endpoint required by regulators around the world for ascertainment of primary endpoint in psychiatric clinical trials. Signal detection in multi-site trials requires strong inter-rater reliability on these instruments; poor inter-rater reliability is associated with increased error variance, reduced study power2 and, ultimately, failed trials. Poor inter-rater reliability, or unreliability, in psychometric rating scales has many sources, including a lack of adherence to structured and semi-structured interviews, rater scoring differences, and inconsistent interview duration.3 Williams & Koback correctly state “The importance of reliability of assessments in a clinical trial cannot be overestimated. Without good interrater agreement the chances of detecting a difference in effect between drug and placebo are significantly reduced.”4 Commonly used methods for establishing and maintaining strong inter-rater reliability include site-rater training, external evaluation and monitoring of site-raters, and centralized rating.Monitoring of endpoint ascertainment in clinical trials is routinely outsourced to Clinical Research Organizations (CROs) and to central laboratories. While psychometric assessments are often monitored by specialized CROs, this may not always be the best choice for a clinical trial. The unique rigor required to ensure valid and reliable clinical scale ratings means CROs must employ enough expert psychometricians who are familiar both with the rating instruments and the unique aspects of the disease and drug being studied. CRO raters must review site assessments within a day of completion to ensure rater quality and accuracy and provide remediation in a timely manner, if needed. Since personnel turnover at CROs may be as high as 20% per year5, outsourcing the day-to-day management of highly nuanced psychometric ratings becomes impractical when there is turnover and inter-rater variation among the “master raters.”The Sponsor Rating Monitoring System (SRMS) was developed as a pre-defined, protocol-specific, data-driven method to optimize psychometric training, data validity and reliability in the context of a clinical trial of a novel antidepressant targeting bipolar depression with suicidality. In this system, the Sponsor employs expert raters with extensive experience in conducting, analyzing, and training others in the rating scales used to ascertain primary and secondary endpoints. In SRMS, these master raters help the clinical operations team select suitable clinical trial sites, document site rater qualifications, oversee rater training and qualification, and confirm that all data management conforms to the Study Protocol and GDP & GCP guidelines. Most importantly, the Sponsor “master raters” review psychometric assessments within 24 to 48 hours and provide corrective feedback, as needed. This approach further allows for referral of an aberrant rating to an adjudicating rater in real time, prior to data unblinding. The centralized SRMS model does not transfer regulatory obligations to an outside CRO or engage multiple data quality systems, which minimizes oversight and subsequent audit responsibilities.We examined the Inter-rater Reliability (IRR), i.e., the concordance between site raters and Sponsor “master raters” on MADRS scores on patients participating in the Phase 2b/3 clinical trial “NRX101 for Suicidal Treatment Resistant Bipolar Depression” (ClinicalTrials.gov Identifier: NCT03395392) to assess the potential efficacy of the SRMS.