DDSC/europarl

Name: DDSC/europarl
Creator: DDSC
Published: 2022-07-01 15:42:03
License: 暂无描述

Hugging Face2022-07-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/DDSC/europarl

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - found language: - da license: - cc-by-4.0 multilinguality: - monolingual pretty_name: TwitterSent size_categories: - n<1K source_datasets: - original task_categories: - text-classification task_ids: - sentiment-classification --- # Dataset Card for DKHate ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Direct Download**: http://danlp-downloads.alexandra.dk/datasets/europarl.sentiment2.zip ### Dataset Summary This dataset consists of Danish data from the European Parliament that has been annotated for sentiment analysis by the [Alexandra Institute](https://github.com/alexandrainst) - all credits go to them. ### Supported Tasks and Leaderboards This dataset is suitable for sentiment analysis. ### Languages This dataset is in Danish. ## Dataset Structure ### Data Instances Every entry in the dataset has a document and an associated label. ### Data Fields An entry in the dataset consists of the following fields: - `text` (`str`): The text content. - `label` (`str`): The label of the `text`. Can be "positiv", "neutral" or "negativ" for positive, neutral and negative sentiment, respectively. ### Data Splits A `train` and `test` split is available, with the test split being 30% of the dataset, randomly sampled in a stratified fashion. There are 669 documents in the training split and 288 in the test split. ## Additional Information ### Dataset Curators The collection and annotation of the dataset is solely due to the [Alexandra Institute](https://github.com/alexandrainst). ### Licensing Information The dataset is released under the CC BY 4.0 license. ### Citation Information ``` @misc{europarl, title={EuroParl}, author={Alexandra Institute}, year={2020}, note={\url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}} } ``` ### Contributions Thanks to [@saattrupdan](https://github.com/saattrupdan) for adding this dataset to the Hugging Face Hub.

提供机构：

DDSC

原始信息汇总

数据集概述

数据集名称: DKHate
数据集别名: TwitterSent
语言: 丹麦语（da）
许可证: CC BY 4.0
多语言性: 单语种
数据集大小: 小于1K
数据来源: 原创
任务类别: 文本分类
任务ID: 情感分类

数据集描述

数据集摘要: 该数据集包含来自欧洲议会的丹麦语数据，由Alexandra Institute进行情感分析标注。
支持的任务: 情感分析

数据集结构

数据实例: 每个数据实例包含文档及其关联的标签。
数据字段:
- text (str): 文本内容。
- label (str): 文本的标签，可以是"positiv"（积极）、"neutral"（中性）或"negativ"（消极）。
数据分割: 提供train和test分割，测试集占30%，随机分层采样。训练集包含669个文档，测试集包含288个文档。

附加信息

数据集创建者: Alexandra Institute
许可证信息: 数据集根据CC BY 4.0许可证发布。
引用信息:

@misc{europarl, title={EuroParl}, author={Alexandra Institute}, year={2020}, note={url{https://danlp-alexandra.readthedocs.io/en/latest/docs/datasets.html#europarl-sentiment2}} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集