Genetic Algorithm based Semi-supervised Convolutional Neural Network for
Real-time Monitoring of Escherichia Coli Fermentation of Recombinant
Protein Production Using a Raman Sensor
Abstract
Raman spectroscopy, as a label-free sensor, is commonly used for
real-time monitoring of key parameters in the cultivation of recombinant
protein. However, ensuring accurate parameter values necessitates a
large quantity of offline measurement data, which is time-consuming and
labor-intensive. In order to address the limitations of conventional
complex data preprocessing, this study considers a genetic
algorithm-based semi-supervised convolutional neural network (GA-SCNN).
The GA-SCNN facilitates feature extraction and unsupervised sequence
labeling, and has been applied to the model system of E. coli expressing
recombinant ProA5M protein. By applying model prediction and sequence
interpolation techniques, the GA-SCNN significantly expanded the
database for glucose, lactate, ammonium ions, and OD600 from 52 to 1302
samples. A comparative analysis using standard regression algorithms has
demonstrated the superior predictive performance of the GA-SCNN
framework when dealing with a large volume of spectral data without the
requirement for preprocessing. Model cross-validation has confirmed high
accuracy and robustness in determining coefficients. In addition, a
transfer learning strategy has been employed using the OD600 data and
limited recombinant protein expression data to develop a prediction
model for the target protein. Validation experiments demonstrate good
agreement between model predictions and offline results.