Evaluation of the preprocessing and training stages in text classification algorithms in the context of information retrieval

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://figshare.com/articles/dataset/Evaluation_of_the_preprocessing_and_training_stages_in_text_classification_algorithms_in_the_context_of_information_retrieval/8162216

下载链接

链接失效反馈

官方服务：

资源简介：

Abstract The amount of unstructured data grows with the popularization of the Internet. Texts in natural language represent a relevant and significant set for the analysis and production of knowledge. This work proposes a quantitative analysis of the preprocessing and training stages of a text classifier, which uses as an attribute the feelings expressed by the users. Artificial Neural Network, as a classifier algorithm, and texts from Amazon, IMDB and Yelp sites were used for the experiments. The database allows the analysis of the expression of positive and negative feelings of the users in evaluations of products and services in unstructured texts. Two distinct processes of preprocessing and different training of the Artificial Neural Networks were carried out to classify the textual set. The results quantitatively confirm the importance of the preprocessing and training stages of the classifier, highlighting the importance of the vocabulary selected for the text representation and classification. The available classification techniques achieve satisfactory results. However, even by using two distinct processes of preprocessing and identifying the best training process, it was not possible to totally eliminate the learning difficulties and understanding of the model for the classifications of feelings that involved subjective characteristics of the expression of human feeling.

创建时间：

2019-03-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集