loading page

Machine learning algorithm to predict allergy: first results of a nationwide Allergen Chip Challenge
  • +9
  • Martinroche G,
  • Amir Guemari,
  • POL ANDRE APOIL,
  • Isabella Annesi-Maesano,
  • Fromentin E,
  • Laurent Guilleminault,
  • Davide Caimmi,
  • Klingebiel C,
  • Alain Didier,
  • Pascal Demoly,
  • Joana Vitte,
  • Julien GORET
Martinroche G
Centre Hospitalier Universitaire de Bordeaux
Author Profile
Amir Guemari
GTESIA
Author Profile
POL ANDRE APOIL
GTESIA
Author Profile
Isabella Annesi-Maesano
University of Montpellier
Author Profile
Fromentin E
GTESIA
Author Profile
Laurent Guilleminault
Centre Hospitalier Universitaire de Toulouse
Author Profile
Davide Caimmi
Arnaud de Villeneuve
Author Profile
Klingebiel C
Synlab France SAS
Author Profile
Alain Didier
Centre Hospitalier Universitaire de Toulouse
Author Profile
Pascal Demoly
GTESIA
Author Profile
Joana Vitte
University of Montpellier
Author Profile
Julien GORET
Centre Hospitalier Universitaire de Bordeaux

Corresponding Author:[email protected]

Author Profile

Abstract

Background: Serum allergen-specific immunoglobulins E (IgE) play a key role in allergy diagnosis along with clinical history and physical examination. Nowadays, allergen multiplex assays allow complex polyallergic cases to be solved as they assess up to 300 allergen-specific IgE. Recently, machine learning has emerged as a trending tool in medicine. The aim was to build a nationwide, open-access database to create an algorithm that could predict allergy diagnosis, severity, category (airborne, food, venom) and culprit allergens. Methods: A retrospective national database was created by the French Society of Allergology in collaboration with AllergoBioNet and the Health Data Hub. Collected data were de-identified patient profiles with five demographic items, twenty clinical items and sIgE results of one allergen multiplex assay. An international crowdsourced machine learning competition was hosted by the Trustii.io platform. Criteria for algorithm evaluation were the F-score (a measure of a model’s accuracy on a dataset) and external validation on patient profiles outside the database (80%-20%, respectively). Results: Data were collected from 4271 patient files. Two hundred and ninety-two data scientists competed with 3135 algorithms. The best F-scores were comprised between 78% and 80%. Models associated with the highest F-scores used gradient boosting classifiers such as LightGBM, CatBoost, XGBoost adapted for tabular datasets with categorical features. Conclusions: We report here the first artificial intelligence models applied to allergen multiplex arrays interpretation in a nationwide real-world database built to be open access. With F-scores close to 80%, the French Allergen Chip Challenge paves the way for a diagnostic prediction tool for practicing allergists.