Machine learning algorithm to predict allergy: first results of a nationwide Allergen Chip Challenge

Martinroche G; Amir Guemari; POL ANDRE APOIL; Isabella Annesi-Maesano; Fromentin E; Laurent Guilleminault; Davide Caimmi; Klingebiel C; Alain Didier; Pascal Demoly; Joana Vitte; Julien GORET

doi:10.22541/au.171617838.88824747/v1

loading page

Machine learning algorithm to predict allergy: first results of a nationwide Allergen Chip Challenge

Martinroche G,
Amir Guemari,
POL ANDRE APOIL,
Isabella Annesi-Maesano,
Fromentin E,
Laurent Guilleminault,
Davide Caimmi,
Klingebiel C,
Alain Didier,
Pascal Demoly,
Joana Vitte,
Julien GORET

Abstract

Background: Serum allergen-specific immunoglobulins E (IgE) play a key role in allergy diagnosis along with clinical history and physical examination. Nowadays, allergen multiplex assays allow complex polyallergic cases to be solved as they assess up to 300 allergen-specific IgE. Recently, machine learning has emerged as a trending tool in medicine. The aim was to build a nationwide, open-access database to create an algorithm that could predict allergy diagnosis, severity, category (airborne, food, venom) and culprit allergens. Methods: A retrospective national database was created by the French Society of Allergology in collaboration with AllergoBioNet and the Health Data Hub. Collected data were de-identified patient profiles with five demographic items, twenty clinical items and sIgE results of one allergen multiplex assay. An international crowdsourced machine learning competition was hosted by the Trustii.io platform. Criteria for algorithm evaluation were the F-score (a measure of a model’s accuracy on a dataset) and external validation on patient profiles outside the database (80%-20%, respectively). Results: Data were collected from 4271 patient files. Two hundred and ninety-two data scientists competed with 3135 algorithms. The best F-scores were comprised between 78% and 80%. Models associated with the highest F-scores used gradient boosting classifiers such as LightGBM, CatBoost, XGBoost adapted for tabular datasets with categorical features. Conclusions: We report here the first artificial intelligence models applied to allergen multiplex arrays interpretation in a nationwide real-world database built to be open access. With F-scores close to 80%, the French Allergen Chip Challenge paves the way for a diagnostic prediction tool for practicing allergists.