five

Fine-Tuning Monolingual Pre-trained BERT Models for Sentiment Analysis in Peruvian Slang Contexts

收藏
DataCite Commons2025-06-01 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Fine-Tuning_Monolingual_Pre-trained_BERT_Models_for_Sentiment_Analysis_in_Peruvian_Slang_Contexts/26198315/1
下载链接
链接失效反馈
官方服务:
资源简介:
Innovation in natural language processing (NLP) has led to the creation of models such as BERT, RoBERTa, GPT-4o, Llama 3 and Gemini. However, the adaptation of these models to specific dialects, especially in languages other than English, remains underexplored, especially with slang or informal language. In response to this need, our research evaluates Spanish monolingual models best suited to Peruvian colloquial expressions, the best alternative being RoBERTuito, a model pre-trained on a large corpus of Spanish tweets that highlights its effectiveness in text classification tasks. We refine and compare this model to reflect the characteristics of Peruvian Spanish. We implemented a Facebook data collection and preprocessing process, focusing on Peruvian Spanish comments. This specialised dataset with over 11,000 labelled comments was used to train monolingual models on the sentiment analysis task and obtain more accurate polarity detection in texts that include Peruvian slang. RoBERTuito achieved a balanced F1-score of 0.750, outperforming BETO (0.661), BERTuit (0.70) and RoBERTa-BNE (0.696). We also evaluated precision, recall and accuracy for a comprehensive evaluation. This study not only provides a solution for sentiment analysis in Peruvian Spanish, but also establishes a basis for adapting monolingual models to linguistic contexts.
提供机构:
figshare
创建时间:
2024-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作