Fine-Tuning Monolingual Pre-trained BERT Models for Sentiment Analysis in Peruvian Slang Contexts
收藏DataCite Commons2025-06-01 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Fine-Tuning_Monolingual_Pre-trained_BERT_Models_for_Sentiment_Analysis_in_Peruvian_Slang_Contexts/26198315/1
下载链接
链接失效反馈官方服务:
资源简介:
Innovation in natural language processing (NLP) has led to the creation of models such as BERT, RoBERTa, GPT-4o, Llama 3 and Gemini. However, the adaptation of these models to specific dialects, especially in languages other than English, remains underexplored, especially with slang or informal language. In response to this need, our research evaluates Spanish monolingual models best suited to Peruvian colloquial expressions, the best alternative being RoBERTuito, a model pre-trained on a large corpus of Spanish tweets that highlights its effectiveness in text classification tasks. We refine and compare this model to reflect the characteristics of Peruvian Spanish. We implemented a Facebook data collection and preprocessing process, focusing on Peruvian Spanish comments. This specialised dataset with over 11,000 labelled comments was used to train monolingual models on the sentiment analysis task and obtain more accurate polarity detection in texts that include Peruvian slang. RoBERTuito achieved a balanced F1-score of 0.750, outperforming BETO (0.661), BERTuit (0.70) and RoBERTa-BNE (0.696). We also evaluated precision, recall and accuracy for a comprehensive evaluation. This study not only provides a solution for sentiment analysis in Peruvian Spanish, but also establishes a basis for adapting monolingual models to linguistic contexts.
提供机构:
figshare
创建时间:
2024-07-07



