five

MichiganNLP/TID-8

收藏
Hugging Face2023-10-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MichiganNLP/TID-8
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - other language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<200K source_datasets: - extended|other task_categories: - text-classification task_ids: - natural-language-inference - sentiment-analysis - hate-speech-detection paperswithcode_id: placeholder pretty_name: TID-8 tags: - tid8 - annotation disagreement dataset_info: - config_name: commitmentbank-ann features: - name: HitID dtype: string - name: Verb dtype: string - name: Context dtype: string - name: Prompt dtype: string - name: Target dtype: string - name: ModalType dtype: string - name: Embedding dtype: string - name: MatTense dtype: string - name: weak_labels sequence: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': '0' '1': '1' '2': '2' '3': '3' '4': '-3' '5': '-1' '6': '-2' splits: - name: train num_bytes: 7153364 num_examples: 7816 - name: test num_bytes: 3353745 num_examples: 3729 download_size: 3278616 dataset_size: 10507109 - config_name: commitmentbank-atr features: - name: HitID dtype: string - name: Verb dtype: string - name: Context dtype: string - name: Prompt dtype: string - name: Target dtype: string - name: ModalType dtype: string - name: Embedding dtype: string - name: MatTense dtype: string - name: weak_labels sequence: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': '0' '1': '1' '2': '2' '3': '3' '4': '-3' '5': '-1' '6': '-2' splits: - name: train num_bytes: 6636145 num_examples: 7274 - name: test num_bytes: 3870964 num_examples: 4271 download_size: 3301698 dataset_size: 10507109 - config_name: friends_qia-ann features: - name: Season dtype: string - name: Episode dtype: string - name: Category dtype: string - name: Q_person dtype: string - name: A_person dtype: string - name: Q_original dtype: string - name: Q_modified dtype: string - name: A_modified dtype: string - name: Annotation_1 dtype: string - name: Annotation_2 dtype: string - name: Annotation_3 dtype: string - name: Goldstandard dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': '1' '1': '2' '2': '3' '3': '4' '4': '5' splits: - name: validation num_bytes: 687135 num_examples: 1872 - name: train num_bytes: 4870170 num_examples: 13113 - name: test num_bytes: 693033 num_examples: 1872 download_size: 1456765 dataset_size: 6250338 - config_name: friends_qia-atr features: - name: Season dtype: string - name: Episode dtype: string - name: Category dtype: string - name: Q_person dtype: string - name: A_person dtype: string - name: Q_original dtype: string - name: Q_modified dtype: string - name: A_modified dtype: string - name: Annotation_1 dtype: string - name: Annotation_2 dtype: string - name: Annotation_3 dtype: string - name: Goldstandard dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': '1' '1': '2' '2': '3' '3': '4' '4': '5' splits: - name: train num_bytes: 4166892 num_examples: 11238 - name: test num_bytes: 2083446 num_examples: 5619 download_size: 3445839 dataset_size: 6250338 - config_name: goemotions-ann features: - name: author dtype: string - name: subreddit dtype: string - name: link_id dtype: string - name: parent_id dtype: string - name: created_utc dtype: string - name: rater_id dtype: string - name: example_very_unclear dtype: string - name: admiration dtype: string - name: amusement dtype: string - name: anger dtype: string - name: annoyance dtype: string - name: approval dtype: string - name: caring dtype: string - name: confusion dtype: string - name: curiosity dtype: string - name: desire dtype: string - name: disappointment dtype: string - name: disapproval dtype: string - name: disgust dtype: string - name: embarrassment dtype: string - name: excitement dtype: string - name: fear dtype: string - name: gratitude dtype: string - name: grief dtype: string - name: joy dtype: string - name: love dtype: string - name: nervousness dtype: string - name: optimism dtype: string - name: pride dtype: string - name: realization dtype: string - name: relief dtype: string - name: remorse dtype: string - name: sadness dtype: string - name: surprise dtype: string - name: neutral dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': positive '1': ambiguous '2': negative '3': neutral splits: - name: train num_bytes: 46277072 num_examples: 135504 - name: test num_bytes: 19831033 num_examples: 58129 download_size: 24217871 dataset_size: 66108105 - config_name: goemotions-atr features: - name: author dtype: string - name: subreddit dtype: string - name: link_id dtype: string - name: parent_id dtype: string - name: created_utc dtype: string - name: rater_id dtype: string - name: example_very_unclear dtype: string - name: admiration dtype: string - name: amusement dtype: string - name: anger dtype: string - name: annoyance dtype: string - name: approval dtype: string - name: caring dtype: string - name: confusion dtype: string - name: curiosity dtype: string - name: desire dtype: string - name: disappointment dtype: string - name: disapproval dtype: string - name: disgust dtype: string - name: embarrassment dtype: string - name: excitement dtype: string - name: fear dtype: string - name: gratitude dtype: string - name: grief dtype: string - name: joy dtype: string - name: love dtype: string - name: nervousness dtype: string - name: optimism dtype: string - name: pride dtype: string - name: realization dtype: string - name: relief dtype: string - name: remorse dtype: string - name: sadness dtype: string - name: surprise dtype: string - name: neutral dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': positive '1': ambiguous '2': negative '3': neutral splits: - name: train num_bytes: 44856233 num_examples: 131395 - name: test num_bytes: 21251872 num_examples: 62238 download_size: 24228953 dataset_size: 66108105 - config_name: hs_brexit-ann features: - name: other annotations dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': hate_speech '1': not_hate_speech splits: - name: train num_bytes: 1039008 num_examples: 4704 - name: test num_bytes: 222026 num_examples: 1008 download_size: 144072 dataset_size: 1261034 - config_name: hs_brexit-atr features: - name: other annotations dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': hate_speech '1': not_hate_speech splits: - name: train num_bytes: 986132 num_examples: 4480 - name: test num_bytes: 495738 num_examples: 2240 download_size: 604516 dataset_size: 1481870 - config_name: humor-ann features: - name: text_a dtype: string - name: text_b dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': B '1': X '2': A splits: - name: train num_bytes: 28524839 num_examples: 98735 - name: test num_bytes: 12220621 num_examples: 42315 download_size: 24035118 dataset_size: 40745460 - config_name: humor-atr features: - name: text_a dtype: string - name: text_b dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': B '1': X '2': A splits: - name: train num_bytes: 28161248 num_examples: 97410 - name: test num_bytes: 12584212 num_examples: 43640 download_size: 24099282 dataset_size: 40745460 - config_name: md-agreement-ann features: - name: task dtype: string - name: original_id dtype: string - name: domain dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': offensive_speech '1': not_offensive_speech splits: - name: train num_bytes: 7794988 num_examples: 32960 - name: test num_bytes: 2498445 num_examples: 10553 download_size: 1606671 dataset_size: 10293433 - config_name: md-agreement-atr features: - name: task dtype: string - name: original_id dtype: string - name: domain dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': offensive_speech '1': not_offensive_speech splits: - name: train num_bytes: 8777085 num_examples: 37077 - name: test num_bytes: 3957021 num_examples: 16688 download_size: 5766114 dataset_size: 12734106 - config_name: pejorative-ann features: - name: pejor_word dtype: string - name: word_definition dtype: string - name: annotator-1 dtype: string - name: annotator-2 dtype: string - name: annotator-3 dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': pejorative '1': non-pejorative '2': undecided splits: - name: train num_bytes: 350734 num_examples: 1535 - name: test num_bytes: 150894 num_examples: 659 download_size: 168346 dataset_size: 501628 - config_name: pejorative-atr features: - name: pejor_word dtype: string - name: word_definition dtype: string - name: annotator-1 dtype: string - name: annotator-2 dtype: string - name: annotator-3 dtype: string - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': pejorative '1': non-pejorative '2': undecided splits: - name: train num_bytes: 254138 num_examples: 1112 - name: test num_bytes: 247490 num_examples: 1082 download_size: 188229 dataset_size: 501628 - config_name: sentiment-ann features: - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': Neutral '1': Somewhat positive '2': Very negative '3': Somewhat negative '4': Very positive splits: - name: train num_bytes: 9350333 num_examples: 59235 - name: test num_bytes: 235013 num_examples: 1419 download_size: 4906597 dataset_size: 9585346 - config_name: sentiment-atr features: - name: question dtype: string - name: uid dtype: string - name: id dtype: int32 - name: annotator_id dtype: string - name: answer dtype: string - name: answer_label dtype: class_label: names: '0': Neutral '1': Somewhat positive '2': Very negative '3': Somewhat negative '4': Very positive splits: - name: train num_bytes: 6712084 num_examples: 42439 - name: test num_bytes: 2873262 num_examples: 18215 download_size: 4762021 dataset_size: 9585346 configs: - config_name: commitmentbank-ann data_files: - split: train path: commitmentbank-ann/train-* - split: test path: commitmentbank-ann/test-* - config_name: commitmentbank-atr data_files: - split: train path: commitmentbank-atr/train-* - split: test path: commitmentbank-atr/test-* - config_name: friends_qia-ann data_files: - split: validation path: friends_qia-ann/validation-* - split: train path: friends_qia-ann/train-* - split: test path: friends_qia-ann/test-* - config_name: friends_qia-atr data_files: - split: train path: friends_qia-atr/train-* - split: test path: friends_qia-atr/test-* - config_name: goemotions-ann data_files: - split: train path: goemotions-ann/train-* - split: test path: goemotions-ann/test-* - config_name: goemotions-atr data_files: - split: train path: goemotions-atr/train-* - split: test path: goemotions-atr/test-* - config_name: hs_brexit-ann data_files: - split: train path: hs_brexit-ann/train-* - split: test path: hs_brexit-ann/test-* - config_name: hs_brexit-atr data_files: - split: train path: hs_brexit-atr/train-* - split: test path: hs_brexit-atr/test-* - config_name: humor-ann data_files: - split: train path: humor-ann/train-* - split: test path: humor-ann/test-* - config_name: humor-atr data_files: - split: train path: humor-atr/train-* - split: test path: humor-atr/test-* - config_name: md-agreement-ann data_files: - split: train path: md-agreement-ann/train-* - split: test path: md-agreement-ann/test-* - config_name: md-agreement-atr data_files: - split: train path: md-agreement-atr/train-* - split: test path: md-agreement-atr/test-* - config_name: pejorative-ann data_files: - split: train path: pejorative-ann/train-* - split: test path: pejorative-ann/test-* - config_name: pejorative-atr data_files: - split: train path: pejorative-atr/train-* - split: test path: pejorative-atr/test-* - config_name: sentiment-ann data_files: - split: train path: sentiment-ann/train-* - split: test path: sentiment-ann/test-* - config_name: sentiment-atr data_files: - split: train path: sentiment-atr/train-* - split: test path: sentiment-atr/test-* --- # Dataset Card for "TID-8" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** placeholder - **Repository:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) - **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Dataset Summary TID-8 is a new aggregated benchmark focused on the task of letting models learn from data that has inherent disagreement proposed in [link](https://arxiv.org/pdf/2305.14663.pdf) at Findings of EMNLP 2023. In the paper, we focus on the inherent disagreement and let the model directly learn from data that has such disagreement. We provide two split for TID-8. *Annotation Split* We split the annotations for each annotator into train and test set. In other words, the same set of annotators appear in both train, (val), and test sets. For datasets that have splits originally, we follow the original split and remove datapoints in test sets that are annotated by an annotator who is not in the training set. For datasets that do not have splits originally, we split the data into train and test set for convenience, you may further split the train set into a train and val set. *Annotator Split* We split annotators into train and test set. In other words, a different set of annotators would appear in train and test sets. We split the data into train and test set for convenience, you may consider further splitting the train set into a train and val set for performance validation. ### Supported Tasks and Leaderboards [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Languages [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Dataset Structure ### Data Instances ### Data Fields The data fields are the same among all splits. See aforementioned information. ### Data Splits See aforementioned information. ## Dataset Creation ### Curation Rationale [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Source Data #### Initial Data Collection and Normalization [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the source language producers? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Annotations #### Annotation process [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) #### Who are the annotators? [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Personal and Sensitive Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Discussion of Biases [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Other Known Limitations [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ## Additional Information ### Dataset Curators [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Licensing Information [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) ### Citation Information ``` @inproceedings{deng2023tid8, title={You Are What You Annotate: Towards Better Models through Annotator Representations}, author={Deng, Naihao and Liu, Siyang and Zhang, Frederick Xinliang and Wu, Winston and Wang, Lu and Mihalcea, Rada}, booktitle={Findings of EMNLP 2023}, year={2023} } Note that each TID-8 dataset has its own citation. Please see the source to get the correct citation for each contained dataset. ```
提供机构:
MichiganNLP
原始信息汇总

数据集卡片 for "TID-8"

数据集描述

数据集概要

TID-8 是一个新的聚合基准,专注于让模型从具有内在分歧的数据中学习,该基准在 EMNLP 2023 的 Findings 中提出。在该论文中,我们关注内在分歧,并让模型直接从具有这种分歧的数据中学习。

我们为 TID-8 提供两种分割:

  • Annotation Split:我们将每个标注者的标注分为训练集和测试集。换句话说,相同的标注者在训练集、验证集和测试集中出现。对于原本有分割的数据集,我们遵循原始分割,并移除测试集中由不在训练集中的标注者标注的数据点。对于原本没有分割的数据集,我们将其方便地分为训练集和测试集,您可以进一步将训练集分为训练集和验证集。

  • Annotator Split:我们将标注者分为训练集和测试集。换句话说,不同的标注者会出现在训练集和测试集中。我们方便地将数据分为训练集和测试集,您可以考虑进一步将训练集分为训练集和验证集以进行性能验证。

支持的任务和排行榜

更多信息需要

语言

更多信息需要

数据集结构

数据实例

commitmentbank-ann

  • 特征

    • HitID: 字符串
    • Verb: 字符串
    • Context: 字符串
    • Prompt: 字符串
    • Target: 字符串
    • ModalType: 字符串
    • Embedding: 字符串
    • MatTense: 字符串
    • weak_labels: 字符串序列
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: 0
      • 1: 1
      • 2: 2
      • 3: 3
      • 4: -3
      • 5: -1
      • 6: -2
  • 分割

    • train: 7816 个样本, 7153364 字节
    • test: 3729 个样本, 3353745 字节
  • 下载大小:3278616 字节

  • 数据集大小:10507109 字节

commitmentbank-atr

  • 特征

    • HitID: 字符串
    • Verb: 字符串
    • Context: 字符串
    • Prompt: 字符串
    • Target: 字符串
    • ModalType: 字符串
    • Embedding: 字符串
    • MatTense: 字符串
    • weak_labels: 字符串序列
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: 0
      • 1: 1
      • 2: 2
      • 3: 3
      • 4: -3
      • 5: -1
      • 6: -2
  • 分割

    • train: 7274 个样本, 6636145 字节
    • test: 4271 个样本, 3870964 字节
  • 下载大小:3301698 字节

  • 数据集大小:10507109 字节

friends_qia-ann

  • 特征

    • Season: 字符串
    • Episode: 字符串
    • Category: 字符串
    • Q_person: 字符串
    • A_person: 字符串
    • Q_original: 字符串
    • Q_modified: 字符串
    • A_modified: 字符串
    • Annotation_1: 字符串
    • Annotation_2: 字符串
    • Annotation_3: 字符串
    • Goldstandard: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: 1
      • 1: 2
      • 2: 3
      • 3: 4
      • 4: 5
  • 分割

    • validation: 1872 个样本, 687135 字节
    • train: 13113 个样本, 4870170 字节
    • test: 1872 个样本, 693033 字节
  • 下载大小:1456765 字节

  • 数据集大小:6250338 字节

friends_qia-atr

  • 特征

    • Season: 字符串
    • Episode: 字符串
    • Category: 字符串
    • Q_person: 字符串
    • A_person: 字符串
    • Q_original: 字符串
    • Q_modified: 字符串
    • A_modified: 字符串
    • Annotation_1: 字符串
    • Annotation_2: 字符串
    • Annotation_3: 字符串
    • Goldstandard: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: 1
      • 1: 2
      • 2: 3
      • 3: 4
      • 4: 5
  • 分割

    • train: 11238 个样本, 4166892 字节
    • test: 5619 个样本, 2083446 字节
  • 下载大小:3445839 字节

  • 数据集大小:6250338 字节

goemotions-ann

  • 特征

    • author: 字符串
    • subreddit: 字符串
    • link_id: 字符串
    • parent_id: 字符串
    • created_utc: 字符串
    • rater_id: 字符串
    • example_very_unclear: 字符串
    • admiration: 字符串
    • amusement: 字符串
    • anger: 字符串
    • annoyance: 字符串
    • approval: 字符串
    • caring: 字符串
    • confusion: 字符串
    • curiosity: 字符串
    • desire: 字符串
    • disappointment: 字符串
    • disapproval: 字符串
    • disgust: 字符串
    • embarrassment: 字符串
    • excitement: 字符串
    • fear: 字符串
    • gratitude: 字符串
    • grief: 字符串
    • joy: 字符串
    • love: 字符串
    • nervousness: 字符串
    • optimism: 字符串
    • pride: 字符串
    • realization: 字符串
    • relief: 字符串
    • remorse: 字符串
    • sadness: 字符串
    • surprise: 字符串
    • neutral: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: positive
      • 1: ambiguous
      • 2: negative
      • 3: neutral
  • 分割

    • train: 135504 个样本, 46277072 字节
    • test: 58129 个样本, 19831033 字节
  • 下载大小:24217871 字节

  • 数据集大小:66108105 字节

goemotions-atr

  • 特征

    • author: 字符串
    • subreddit: 字符串
    • link_id: 字符串
    • parent_id: 字符串
    • created_utc: 字符串
    • rater_id: 字符串
    • example_very_unclear: 字符串
    • admiration: 字符串
    • amusement: 字符串
    • anger: 字符串
    • annoyance: 字符串
    • approval: 字符串
    • caring: 字符串
    • confusion: 字符串
    • curiosity: 字符串
    • desire: 字符串
    • disappointment: 字符串
    • disapproval: 字符串
    • disgust: 字符串
    • embarrassment: 字符串
    • excitement: 字符串
    • fear: 字符串
    • gratitude: 字符串
    • grief: 字符串
    • joy: 字符串
    • love: 字符串
    • nervousness: 字符串
    • optimism: 字符串
    • pride: 字符串
    • realization: 字符串
    • relief: 字符串
    • remorse: 字符串
    • sadness: 字符串
    • surprise: 字符串
    • neutral: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: positive
      • 1: ambiguous
      • 2: negative
      • 3: neutral
  • 分割

    • train: 131395 个样本, 44856233 字节
    • test: 62238 个样本, 21251872 字节
  • 下载大小:24228953 字节

  • 数据集大小:66108105 字节

hs_brexit-ann

  • 特征

    • other annotations: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: hate_speech
      • 1: not_hate_speech
  • 分割

    • train: 4704 个样本, 1039008 字节
    • test: 1008 个样本, 222026 字节
  • 下载大小:144072 字节

  • 数据集大小:1261034 字节

hs_brexit-atr

  • 特征

    • other annotations: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: hate_speech
      • 1: not_hate_speech
  • 分割

    • train: 4480 个样本, 986132 字节
    • test: 2240 个样本, 495738 字节
  • 下载大小:604516 字节

  • 数据集大小:1481870 字节

humor-ann

  • 特征

    • text_a: 字符串
    • text_b: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: B
      • 1: X
      • 2: A
  • 分割

    • train: 98735 个样本, 28524839 字节
    • test: 42315 个样本, 12220621 字节
  • 下载大小:24035118 字节

  • 数据集大小:40745460 字节

humor-atr

  • 特征

    • text_a: 字符串
    • text_b: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: B
      • 1: X
      • 2: A
  • 分割

    • train: 97410 个样本, 28161248 字节
    • test: 43640 个样本, 12584212 字节
  • 下载大小:24099282 字节

  • 数据集大小:40745460 字节

md-agreement-ann

  • 特征

    • task: 字符串
    • original_id: 字符串
    • domain: 字符串
    • question: 字符串
    • uid: 字符串
    • id: 整数 (int32)
    • annotator_id: 字符串
    • answer: 字符串
    • answer_label: 类别标签
      • 0: offensive_speech
      • 1: not_offensive_speech
  • 分割

    • train: 32960 个样本, 7794988 字节
    • test: 10553 个样本, 2498445 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作