Abstract
Service desk systems have a vast and rich base of information,
consisting of the history of calls made, which can and should be used as
a reference base for subsequent calls. Common search tools, such as
keyword searches, prove to be unfeasible in large datasets, in addition
to being able to bring results not necessarily related to the problem.
“State-of-the-art” techniques exist, but they require high
computational and operational costs for their training and use. In this
sense, the purpose of this work is to investigate the sensitivity of
machine learning algorithms in finding the characteristic defined here
as “relevance”: the characteristic of texts with knowledge that can be
reused. The motivation is that non-relevant texts can be removed in
advance from the database, allowing complex algorithms to be employed in
a more condensed database, reducing computational costs. Tests were
performed with several combinations between the TF-IDF vectorizer and
the word embedding Doc2Vec and the classic classifiers Naive-Bayes,
Adaptive Boosting, Random Forest, Stochastic Gradient Descent, Logistic
Regression, Support Vector Machine and Light Gradient Boosting Machine,
and the classifier TextConvoNet, an architecture based on Convolutional
Neural Networks. The TextConvoNet classifier presented the best results,
with metrics close to 0.93, showing that the concept is detectable and
that the technique is viable for removing non-relevant texts from a
database.