DDSC/europarl
收藏Hugging Face2022-07-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/DDSC/europarl
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- found
language:
- da
license:
- cc-by-4.0
multilinguality:
- monolingual
pretty_name: TwitterSent
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- sentiment-classification
---
# Dataset Card for DKHate
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Direct Download**: http://danlp-downloads.alexandra.dk/datasets/europarl.sentiment2.zip
### Dataset Summary
This dataset consists of Danish data from the European Parliament that has been annotated for sentiment analysis by the [Alexandra Institute](https://github.com/alexandrainst) - all credits go to them.
### Supported Tasks and Leaderboards
This dataset is suitable for sentiment analysis.
### Languages
This dataset is in Danish.
## Dataset Structure
### Data Instances
Every entry in the dataset has a document and an associated label.
### Data Fields
An entry in the dataset consists of the following fields:
- `text` (`str`): The text content.
- `label` (`str`): The label of the `text`. Can be "positiv", "neutral" or "negativ" for positive, neutral and negative sentiment, respectively.
### Data Splits
A `train` and `test` split is available, with the test split being 30% of the dataset, randomly sampled in a stratified fashion. There are 669 documents in the training split and 288 in the test split.
## Additional Information
### Dataset Curators
The collection and annotation of the dataset is solely due to the [Alexandra Institute](https://github.com/alexandrainst).
### Licensing Information
The dataset is released under the CC BY 4.0 license.
### Citation Information
```
@misc{europarl,
title={EuroParl},
author={Alexandra Institute},
year={2020},
note={\url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}}
}
```
### Contributions
Thanks to [@saattrupdan](https://github.com/saattrupdan) for adding this dataset to the Hugging Face Hub.
提供机构:
DDSC
原始信息汇总
数据集概述
- 数据集名称: DKHate
- 数据集别名: TwitterSent
- 语言: 丹麦语(da)
- 许可证: CC BY 4.0
- 多语言性: 单语种
- 数据集大小: 小于1K
- 数据来源: 原创
- 任务类别: 文本分类
- 任务ID: 情感分类
数据集描述
- 数据集摘要: 该数据集包含来自欧洲议会的丹麦语数据,由Alexandra Institute进行情感分析标注。
- 支持的任务: 情感分析
数据集结构
- 数据实例: 每个数据实例包含文档及其关联的标签。
- 数据字段:
text(str): 文本内容。label(str): 文本的标签,可以是"positiv"(积极)、"neutral"(中性)或"negativ"(消极)。
- 数据分割: 提供
train和test分割,测试集占30%,随机分层采样。训练集包含669个文档,测试集包含288个文档。
附加信息
-
数据集创建者: Alexandra Institute
-
许可证信息: 数据集根据CC BY 4.0许可证发布。
-
引用信息:
@misc{europarl, title={EuroParl}, author={Alexandra Institute}, year={2020}, note={url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}} }



