Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Relevance classification on service desk texts using Natural Language Processing
  • Marciel Mario Degasperi,
  • Daniel Cavalieri,
  • Fidelis Zanetti de Castro
Marciel Mario Degasperi
Federal Institute of Education Science and Technology of Espírito Santo at Serra Espirito Santo

Corresponding Author:[email protected]

Author Profile
Daniel Cavalieri
Federal Institute of Education Science and Technology of Espírito Santo at Serra Espirito Santo
Author Profile
Fidelis Zanetti de Castro
Federal Institute of Education Science and Technology of Espírito Santo at Serra Espirito Santo
Author Profile

Abstract

Service desk systems have a vast and rich base of information, consisting of the history of calls made, which can and should be used as a reference base for subsequent calls. Common search tools, such as keyword searches, prove to be unfeasible in large datasets, in addition to being able to bring results not necessarily related to the problem. “State-of-the-art” techniques exist, but they require high computational and operational costs for their training and use. In this sense, the purpose of this work is to investigate the sensitivity of machine learning algorithms in finding the characteristic defined here as “relevance”: the characteristic of texts with knowledge that can be reused. The motivation is that non-relevant texts can be removed in advance from the database, allowing complex algorithms to be employed in a more condensed database, reducing computational costs. Tests were performed with several combinations between the TF-IDF vectorizer and the word embedding Doc2Vec and the classic classifiers Naive-Bayes, Adaptive Boosting, Random Forest, Stochastic Gradient Descent, Logistic Regression, Support Vector Machine and Light Gradient Boosting Machine, and the classifier TextConvoNet, an architecture based on Convolutional Neural Networks. The TextConvoNet classifier presented the best results, with metrics close to 0.93, showing that the concept is detectable and that the technique is viable for removing non-relevant texts from a database.