Prediction of cell penetrating peptides and their uptake efficiency
using random forest-based feature selections
Abstract
Cell penetrating peptides (CPPs) are short peptides that can carry
biomolecules of varying sizes across the cell membrane into the
cytoplasm. Correctly identifying CPPs is the basis for studying their
functions and mechanisms. Here, we propose a novel CPP predictor that is
able to predict CPPs and their uptake efficiency. In our method, five
feature descriptors are applied to encode the sequence and compose a
hybrid feature vector. Afterward, the wrapper + random forest algorithm
is employed, which combines feature selection with the prediction
process to find features that are crucial for identifying CPPs. The
jackknife cross validation result shows that our predictor is comparable
to state-of-the-art CPP predictors, and our method reduces the feature
dimension, which improves computational efficiency and avoids
overfitting, allowing our predictor to be adopted to identify
large-scale CPP data.