Relevance classification on service desk texts using Natural Language Processing

Marciel Mario Degasperi; Daniel Cavalieri; Fidelis Zanetti de Castro

doi:10.22541/au.168861824.42810002/v1

loading page

Relevance classification on service desk texts using Natural Language Processing

Marciel Mario Degasperi,
Daniel Cavalieri,
Fidelis Zanetti de Castro

Abstract

Service desk systems have a vast and rich base of information, consisting of the history of calls made, which can and should be used as a reference base for subsequent calls. Common search tools, such as keyword searches, prove to be unfeasible in large datasets, in addition to being able to bring results not necessarily related to the problem. “State-of-the-art” techniques exist, but they require high computational and operational costs for their training and use. In this sense, the purpose of this work is to investigate the sensitivity of machine learning algorithms in finding the characteristic defined here as “relevance”: the characteristic of texts with knowledge that can be reused. The motivation is that non-relevant texts can be removed in advance from the database, allowing complex algorithms to be employed in a more condensed database, reducing computational costs. Tests were performed with several combinations between the TF-IDF vectorizer and the word embedding Doc2Vec and the classic classifiers Naive-Bayes, Adaptive Boosting, Random Forest, Stochastic Gradient Descent, Logistic Regression, Support Vector Machine and Light Gradient Boosting Machine, and the classifier TextConvoNet, an architecture based on Convolutional Neural Networks. The TextConvoNet classifier presented the best results, with metrics close to 0.93, showing that the concept is detectable and that the technique is viable for removing non-relevant texts from a database.