SOLD, SemiSOLD
收藏arXiv2024-03-28 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2212.00851v2
下载链接
链接失效反馈官方服务:
资源简介:
SOLD是一个包含10,000条来自Twitter的手动标注数据集,用于识别Sinhala语言中的offensive内容,标注级别包括句子和词级别。SemiSOLD是一个更大的数据集,包含超过145,000条Sinhala推文,采用半监督方法进行标注。
SOLD is a manually annotated dataset comprising 10,000 Twitter posts for offensive content detection in the Sinhala language, with annotation levels covering both sentence-level and word-level. SemiSOLD is a larger-scale dataset containing over 145,000 Sinhala tweets, which was annotated using semi-supervised methods.
创建时间:
2022-12-02



