five

DDSC/europarl

收藏
Hugging Face2022-07-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/DDSC/europarl
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language_creators: - found language: - da license: - cc-by-4.0 multilinguality: - monolingual pretty_name: TwitterSent size_categories: - n<1K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification --- # Dataset Card for DKHate ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Direct Download**: http://danlp-downloads.alexandra.dk/datasets/europarl.sentiment2.zip ### Dataset Summary This dataset consists of Danish data from the European Parliament that has been annotated for sentiment analysis by the [Alexandra Institute](https://github.com/alexandrainst) - all credits go to them. ### Supported Tasks and Leaderboards This dataset is suitable for sentiment analysis. ### Languages This dataset is in Danish. ## Dataset Structure ### Data Instances Every entry in the dataset has a document and an associated label. ### Data Fields An entry in the dataset consists of the following fields: - `text` (`str`): The text content. - `label` (`str`): The label of the `text`. Can be "positiv", "neutral" or "negativ" for positive, neutral and negative sentiment, respectively. ### Data Splits A `train` and `test` split is available, with the test split being 30% of the dataset, randomly sampled in a stratified fashion. There are 669 documents in the training split and 288 in the test split. ## Additional Information ### Dataset Curators The collection and annotation of the dataset is solely due to the [Alexandra Institute](https://github.com/alexandrainst). ### Licensing Information The dataset is released under the CC BY 4.0 license. ### Citation Information ``` @misc{europarl, title={EuroParl}, author={Alexandra Institute}, year={2020}, note={\url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}} } ``` ### Contributions Thanks to [@saattrupdan](https://github.com/saattrupdan) for adding this dataset to the Hugging Face Hub.
提供机构:
DDSC
原始信息汇总

数据集概述

  • 数据集名称: DKHate
  • 数据集别名: TwitterSent
  • 语言: 丹麦语(da)
  • 许可证: CC BY 4.0
  • 多语言性: 单语种
  • 数据集大小: 小于1K
  • 数据来源: 原创
  • 任务类别: 文本分类
  • 任务ID: 情感分类

数据集描述

  • 数据集摘要: 该数据集包含来自欧洲议会的丹麦语数据,由Alexandra Institute进行情感分析标注。
  • 支持的任务: 情感分析

数据集结构

  • 数据实例: 每个数据实例包含文档及其关联的标签。
  • 数据字段:
    • text (str): 文本内容。
    • label (str): 文本的标签,可以是"positiv"(积极)、"neutral"(中性)或"negativ"(消极)。
  • 数据分割: 提供traintest分割,测试集占30%,随机分层采样。训练集包含669个文档,测试集包含288个文档。

附加信息

  • 数据集创建者: Alexandra Institute

  • 许可证信息: 数据集根据CC BY 4.0许可证发布。

  • 引用信息:

    @misc{europarl, title={EuroParl}, author={Alexandra Institute}, year={2020}, note={url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作