SentiALG
收藏arXiv2018-08-15 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1808.05079v1
下载链接
链接失效反馈官方服务:
资源简介:
SentiALG是一个针对阿尔及尔方言(一种马格里布阿拉伯方言)的情感分析自动标注语料库。该数据集由阿尔及尔高等应用科学学院创建,包含8000条社交媒体消息,其中4000条为阿拉伯语,4000条为阿拉伯语的拉丁化形式(Arabizi)。数据集的构建基于自动生成的阿尔及尔情感词典,并处理了阿拉伯语和Arabizi两种广泛使用的脚本。该数据集旨在解决阿尔及尔方言情感分析资源匮乏的问题,特别是在处理Arabizi文本方面的挑战。
SentiALG is an automatically annotated sentiment analysis corpus for Algerian dialect, a Maghrebi Arabic dialect. Developed by the Algiers Higher School of Applied Sciences, this dataset comprises 8,000 social media messages: 4,000 in Arabic and 4,000 in Latinized Arabic (Arabizi). Built upon an automatically generated Algerian sentiment lexicon, the dataset supports processing of two widely used scripts: Arabic and Arabizi. This corpus aims to address the scarcity of sentiment analysis resources for Algerian dialect, particularly the challenges associated with processing Arabizi text.
提供机构:
阿尔及尔高等应用科学学院
创建时间:
2018-08-15



