News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0

Name: News sentiment analysis datasets for Serbian, Bosnian, Macedonian, Albanian and Estonian SADEmma 1.0
Creator: hdl.handle.net
License: 暂无描述

hdl.handle.net2025-03-24 收录

下载链接：

http://hdl.handle.net/11356/1987

下载链接

链接失效反馈

官方服务：

资源简介：

We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label. For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408). The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label.

本数据集提供了塞尔维亚语、波斯尼亚语、马其顿语、阿尔巴尼亚语和爱沙尼亚语的标注数据集，标注范围为三点情感量表（正面、中性及负面）。对于除爱沙尼亚语以外的所有语言，数据集包含了源URL（对应文本所在位置）与情感标签的配对。针对爱沙尼亚语，我们从“Ekspress新闻文章存档（爱沙尼亚语和俄语）1.0”中随机抽取了100篇文章（http://hdl.handle.net/11356/1408）。数据以制表符分隔值（Tab-Separated Values，简称TSV）格式组织。对于塞尔维亚语、波斯尼亚语、马其顿语和阿尔巴尼亚语，数据集包含两列：源URL和情感标签。而对于爱沙尼亚语，数据集由三列组成：文本ID（来自上述CLARIN.SI参考）、正文文本和情感标签。

提供机构：

hdl.handle.net