Exploiting Protein Language Models for the Precise Classification of Ion
Channels and Ion Transporters
Abstract
This study presents TooT-PLM-ionCT, a holistic framework that exploits
the capabilities of six diverse Protein Language Models (PLMs) -
ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B
parameters) - for precise classification of integral membrane proteins,
specifically ion channels (ICs) and ion transporters (ITs). As these
proteins play a pivotal role in the regulation of ion movement across
cellular membranes, they are integral to numerous biological processes
and overall cellular vitality. To circumvent the costly and
time-consuming nature of wet lab experiments, we harness the predictive
prowess of PLMs, drawing parallels with techniques in natural language
processing. Our strategy engages six classifiers, embracing both
conventional methodologies and a deep learning model, to segregate ICs
and ITs from other membrane proteins, as well as differentiate ICs from
ITs. Furthermore, we delve into critical factors influencing our tasks,
including the implications of dataset balancing, the effect of frozen
versus fine-tuned PLM representations, and the potential variance
between half and full precision floating-point computations. Our
empirical results showcase superior performance in distinguishing ITs
from other membrane proteins and differentiating ICs from ITs, while the
task of discriminating ICs from other membrane proteins exhibits results
commensurate with the current state-of-the-art.