scherrmann/financial_phrasebank_75agree_german
收藏Hugging Face2023-11-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/scherrmann/financial_phrasebank_75agree_german
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- de
license:
- cc-by-nc-sa-3.0
multilinguality:
- monolingual
task_categories:
- text-classification
task_ids:
- multi-class-classification
- sentiment-classification
pretty_name: FinancialPhrasebankGerman
tags:
- finance
dataset_info:
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': negative
'1': neutral
'2': positive
splits:
- name: train
num_bytes: 422345
num_examples: 2763
- name: validation
num_bytes: 51710
num_examples: 344
- name: test
num_bytes: 55109
num_examples: 346
download_size: 318382
dataset_size: 529164
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Dataset Card for German financial_phrasebank
## Dataset Description
### Dataset Summary
This datset is a German translation of the financial phrasebank of [Malo et al. (2013)](https://arxiv.org/abs/1307.5336) with a minimum agreement rate between annotators of 75% (3453 observations in total). The translation was mechanically accomplished with [Deepl](https://www.deepl.com/translator).
### Supported Tasks and Leaderboards
Sentiment Classification
### Languages
German
## Dataset Structure
### Data Instances
```
{ "sentence": "Die finnische nationale Fluggesellschaft gab an, dass der Nettoverlust in den Monaten April bis Juni 26 Millionen Euro betrug, verglichen mit einem Nettogewinn von 13 Millionen Euro im Vorjahr..",
"label": "negative"
}
```
### Data Fields
- sentence: a tokenized line from the dataset
- label: a label corresponding to the class as a string: 'positive', 'negative' or 'neutral'
### Data Splits
The data is splitted in a train, test and validation set using stratified sampling:
- train (2763 observations)
- validation (344 observations)
- test (346 observations)
## Further Information
For further information regarding the source data or the annotation process, please look at the original [paper](https://arxiv.org/abs/1307.5336) or the original [dataset](https://huggingface.co/datasets/financial_phrasebank).
## Licensing Information
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/.
In particular, this license permits the free use of the data for non-commercial purposes.
If you are interested in commercial use of the data, please contact the authors of the original datset for an appropriate license:
- [Pekka Malo](mailto:pekka.malo@aalto.fi)
- [Ankur Sinha](mailto:ankur.sinha@aalto.fi)
提供机构:
scherrmann
原始信息汇总
数据集卡片 for German financial_phrasebank
数据集描述
数据集概述
该数据集是Malo et al. (2013)的金融短语库的德语翻译版本,注释者之间的最低一致率为75%(总共3453个观察值)。翻译是通过Deepl机械完成的。
支持的任务和排行榜
情感分类
语言
德语
数据集结构
数据实例
json { "sentence": "Die finnische nationale Fluggesellschaft gab an, dass der Nettoverlust in den Monaten April bis Juni 26 Millionen Euro betrug, verglichen mit einem Nettogewinn von 13 Millionen Euro im Vorjahr..", "label": "negative" }
数据字段
- sentence: 数据集中的一条分词行
- label: 对应类别的标签,字符串形式:positive, negative 或 neutral
数据分割
数据使用分层抽样分为训练集、测试集和验证集:
- 训练集 (2763 个观察值)
- 验证集 (344 个观察值)
- 测试集 (346 个观察值)
许可信息
本作品根据知识共享署名-非商业性使用-相同方式共享3.0未移植许可证进行许可。要查看此许可证的副本,请访问http://creativecommons.org/licenses/by-nc-sa/3.0/。 特别是,该许可证允许免费使用数据进行非商业用途。
如果您对数据的商业使用感兴趣,请联系原始数据集的作者获取适当的许可证:



