five

scherrmann/financial_phrasebank_75agree_german

收藏
Hugging Face2023-11-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/scherrmann/financial_phrasebank_75agree_german
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - de license: - cc-by-nc-sa-3.0 multilinguality: - monolingual task_categories: - text-classification task_ids: - multi-class-classification - sentiment-classification pretty_name: FinancialPhrasebankGerman tags: - finance dataset_info: features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': negative '1': neutral '2': positive splits: - name: train num_bytes: 422345 num_examples: 2763 - name: validation num_bytes: 51710 num_examples: 344 - name: test num_bytes: 55109 num_examples: 346 download_size: 318382 dataset_size: 529164 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* --- # Dataset Card for German financial_phrasebank ## Dataset Description ### Dataset Summary This datset is a German translation of the financial phrasebank of [Malo et al. (2013)](https://arxiv.org/abs/1307.5336) with a minimum agreement rate between annotators of 75% (3453 observations in total). The translation was mechanically accomplished with [Deepl](https://www.deepl.com/translator). ### Supported Tasks and Leaderboards Sentiment Classification ### Languages German ## Dataset Structure ### Data Instances ``` { "sentence": "Die finnische nationale Fluggesellschaft gab an, dass der Nettoverlust in den Monaten April bis Juni 26 Millionen Euro betrug, verglichen mit einem Nettogewinn von 13 Millionen Euro im Vorjahr..", "label": "negative" } ``` ### Data Fields - sentence: a tokenized line from the dataset - label: a label corresponding to the class as a string: 'positive', 'negative' or 'neutral' ### Data Splits The data is splitted in a train, test and validation set using stratified sampling: - train (2763 observations) - validation (344 observations) - test (346 observations) ## Further Information For further information regarding the source data or the annotation process, please look at the original [paper](https://arxiv.org/abs/1307.5336) or the original [dataset](https://huggingface.co/datasets/financial_phrasebank). ## Licensing Information This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/. In particular, this license permits the free use of the data for non-commercial purposes. If you are interested in commercial use of the data, please contact the authors of the original datset for an appropriate license: - [Pekka Malo](mailto:pekka.malo@aalto.fi) - [Ankur Sinha](mailto:ankur.sinha@aalto.fi)
提供机构:
scherrmann
原始信息汇总

数据集卡片 for German financial_phrasebank

数据集描述

数据集概述

该数据集是Malo et al. (2013)的金融短语库的德语翻译版本,注释者之间的最低一致率为75%(总共3453个观察值)。翻译是通过Deepl机械完成的。

支持的任务和排行榜

情感分类

语言

德语

数据集结构

数据实例

json { "sentence": "Die finnische nationale Fluggesellschaft gab an, dass der Nettoverlust in den Monaten April bis Juni 26 Millionen Euro betrug, verglichen mit einem Nettogewinn von 13 Millionen Euro im Vorjahr..", "label": "negative" }

数据字段

  • sentence: 数据集中的一条分词行
  • label: 对应类别的标签,字符串形式:positive, negative 或 neutral

数据分割

数据使用分层抽样分为训练集、测试集和验证集:

  • 训练集 (2763 个观察值)
  • 验证集 (344 个观察值)
  • 测试集 (346 个观察值)

许可信息

本作品根据知识共享署名-非商业性使用-相同方式共享3.0未移植许可证进行许可。要查看此许可证的副本,请访问http://creativecommons.org/licenses/by-nc-sa/3.0/。 特别是,该许可证允许免费使用数据进行非商业用途。

如果您对数据的商业使用感兴趣,请联系原始数据集的作者获取适当的许可证:

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作