AbstractArtificial neural networks trained on large, expert-labelled datasets are considered state-of-the-art for a range of medical image recognition tasks. However, categorically labelled datasets are time-consuming to generate and constrain classification to a pre-defined, fixed set of classes. For neuroradiological applications in particular, this represents a barrier to clinical adoption. To address these challenges, we present a self-supervised text-vision framework that learns to detect clinically relevant abnormalities in brain MRI scans by directly leveraging the rich information contained in accompanying free-text neuroradiology reports. Our training approach consisted of two-steps. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports (N = 50,523) via domain-specific self-supervised learning tasks. Next, convolutional neural networks (one per MRI sequence) learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., 'there is an acute stroke', 'there is hydrocephalus' etc.), enabling a range of classification-based applications including automated triage. Potentially, our framework could also serve as a clinical decision support tool, not only by suggesting findings to radiologists and detecting errors in provisional reports, but also by retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors.1. IntroductionMagnetic resonance imaging (MRI) plays a key role in the diagnosis and management of a range of neurological conditions (Atlas, 2009). However, the growing demand for brain MRI examinations, along with a global shortage of radiologists, is taking its toll on healthcare systems. Increasingly, radiologists are unable to fulfill their reporting requirements within contracted hours, leading to substantial reporting delays (NHS, 2021)(Wood et al., 2021). Concerns about fatigue-related diagnostic errors are also mounting as radiologists become increasingly overworked (Vosshenrich et al., 2021). Ultimately, reporting delays and errors lead to delays in treatment; for many abnormalities, this results in poorer patient outcomes and inflated healthcare costs (Adams et al., 2005).Potentially, artificial intelligence (AI) could be used to relieve some of the pressure on radiology departments, for example by supporting real-time triaging of examinations (Annarumma et al., 2019)(Yala et al., 2019)(Wood et al., 2022)(Verburg et al., 2022)(Agarwal et al., 2023)(Booth et al., 2023)(Agarwal et al., 2023) or assisting radiologists to reduce errors in radiology reports. To date, efforts in this direction have largely relied on deep learning models trained on expert-labelled datasets (Gulshan, 2016))(Titano et al., 2018)(De Fauw et al., 2018)(Ardila et al., 2019)(McKinney et al., 2020)(Wood et al., 2022)(Din et al., 2023)(Chelliah et al., 2024). However, there are key limitations to this approach. First, the growing pressure on clinical services has made it increasingly difficult to justify using radiologists’ time to manually annotate images for research purposes; obtaining large, clinically representative training datasets therefore represents a bottleneck to model development (Wood et al., 2020)(Benger et al., 2023)(Wood et al., 2024). Second, the use of categorically labelled datasets in conjunction with supervised learning methods inherently restricts classification to a pre-defined, fixed set of classes. As such, whenever a new classification task emerges, additional labelled training examples are needed. This poses a considerable problem for neuroradiological applications, where the dynamic nature of clinical demands constantly alters the landscape of automation possibilities. For example, the class of ‘tumours’ may become insufficient for a detection task when there is a new demand for a particular type of tumour; additional labelling of the particular type of tumour is required (Louis et al., 2021).These issues, among others, have led to a growing interest in multi-modal (e.g., text-vision) self-supervised methods which enable computer vision models to learn directly from free-text radiology reports (Zhang et al., 2022)(Boecking et al., 2022)(Bannur et al., 2023). Radiology reports represent promising training data since they i) contain detailed descriptions and impressions of all image findings observed by expert radiologists; and ii) are typically stored alongside imaging data on hospital picture archiving and communication systems (PACS) and so are relatively easy to obtain. To date, however, the application of self-supervised methods has largely been limited to image recognition tasks involving chest radiographs - due in part to the availability of open-access, paired image-text datasets such as MIMIC Chest X-ray (MIMIC-CXR) (Johnson et al., 2019). To our knowledge there has been no previous demonstration of text-vision models for either brain abnormality detection or for the highly complex modality of MRI (Wood et al., 2022).Here, we present a self-supervised text-vision framework which learns to detect clinically relevant abnormalities from unlabelled hospital brain MRI scans. Our two-step training approach proceeded as follows. First, a dedicated neuroradiological language model - NeuroBERT - was trained to generate fixed-dimensional vector representations of neuroradiology reports via domain-specific self-supervised learning tasks. Next, convolutional neural networks (CNN) - one per MRI sequence type, covering the full range of sequences performed during routine examinations - learnt to map individual brain scans to their corresponding text vector representations by optimising a mean square error (MSE) loss. Once trained, our text-vision framework can be used to detect abnormalities in unreported brain MRI examinations by scoring scans against suitable query sentences (e.g., ‘this is a normal study’, or ‘there is an acute stroke’ etc.), opening a range of classification-based applications including automated triage (Fig. 1), diagnosis, and treatment response assessment. Potentially, our framework could also operate as a clinical decision support tool by suggesting findings to radiologists, detecting errors in provisional reports, and retrieving and displaying examples of pathologies from historical examinations that could be relevant to the current case based on textual descriptors.