five

Annotation agreement for sentiment classes.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Annotation_agreement_for_sentiment_classes_/29248510
下载链接
链接失效反馈
官方服务:
资源简介:
While sentiment analysis systems excel in high-resource languages, most African languages facing limited resources, remain under-represented. This gap leaves a significant portion of the world’s population without access to technologies in their native languages. However, multilingual pre-trained language models (PLM) offer a promising approach for sentiment analysis in low-resource languages. Although the absence of large data in African languages poses a challenge for developing PLMs, fine-tuning and task adaptation of existing multilingual PLMs is an alternative solution. This paper explores the use of multilingual PLMs for sentiment analysis in five Southern African languages: Sepedi, Sesotho, Setswana, isiXhosa, and isiZulu. We leverage existing PLMs and fine-tune them for this specific task, avoiding training the models from scratch. Our work expands on the SAfriSenti corpus, a Twitter sentiment dataset for these languages. We employ various annotation techniques to create a labelled dataset and perform benchmark experiments utilising various multilingual PLMs. Our findings demonstrate the effectiveness of multilingual PLM, particularly for closely-related languages (Sotho-Tswana), where the ensemble PLMs method achieved an average weighted F1 score above 63%. In particular, Nguni closely-related languages achieved an even higher average weighted F1 score, exceeding 77%, highlighting the potential of PLMs for sentiment analysis in South African languages.
创建时间:
2025-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作