five

sileod/pragmeval

收藏
Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/sileod/pragmeval
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - found language_creators: - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 10K<n<100K - 1K<n<10K - n<1K source_datasets: - original task_categories: - text-classification task_ids: - multi-class-classification pretty_name: pragmeval dataset_info: - config_name: verifiability features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': experiential '1': unverifiable '2': non-experiential - name: idx dtype: int32 splits: - name: train num_bytes: 592520 num_examples: 5712 - name: validation num_bytes: 65215 num_examples: 634 - name: test num_bytes: 251799 num_examples: 2424 download_size: 5330724 dataset_size: 909534 - config_name: emobank-arousal features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 567660 num_examples: 5470 - name: validation num_bytes: 71221 num_examples: 684 - name: test num_bytes: 69276 num_examples: 683 download_size: 5330724 dataset_size: 708157 - config_name: switchboard features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': Response Acknowledgement '1': Uninterpretable '2': Or-Clause '3': Reject '4': Statement-non-opinion '5': 3rd-party-talk '6': Repeat-phrase '7': Hold Before Answer/Agreement '8': Signal-non-understanding '9': Offers, Options Commits '10': Agree/Accept '11': Dispreferred Answers '12': Hedge '13': Action-directive '14': Tag-Question '15': Self-talk '16': Yes-No-Question '17': Rhetorical-Question '18': No Answers '19': Open-Question '20': Conventional-closing '21': Other Answers '22': Acknowledge (Backchannel) '23': Wh-Question '24': Declarative Wh-Question '25': Thanking '26': Yes Answers '27': Affirmative Non-yes Answers '28': Declarative Yes-No-Question '29': Backchannel in Question Form '30': Apology '31': Downplayer '32': Conventional-opening '33': Collaborative Completion '34': Summarize/Reformulate '35': Negative Non-no Answers '36': Statement-opinion '37': Appreciation '38': Other '39': Quotation '40': Maybe/Accept-part - name: idx dtype: int32 splits: - name: train num_bytes: 1021220 num_examples: 18930 - name: validation num_bytes: 116058 num_examples: 2113 - name: test num_bytes: 34013 num_examples: 649 download_size: 5330724 dataset_size: 1171291 - config_name: persuasiveness-eloquence features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 153946 num_examples: 725 - name: validation num_bytes: 19376 num_examples: 91 - name: test num_bytes: 18379 num_examples: 90 download_size: 5330724 dataset_size: 191701 - config_name: mrda features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': Declarative-Question '1': Statement '2': Reject '3': Or-Clause '4': 3rd-party-talk '5': Continuer '6': Hold Before Answer/Agreement '7': Assessment/Appreciation '8': Signal-non-understanding '9': Floor Holder '10': Sympathy '11': Dispreferred Answers '12': Reformulate/Summarize '13': Exclamation '14': Interrupted/Abandoned/Uninterpretable '15': Expansions of y/n Answers '16': Action-directive '17': Tag-Question '18': Accept '19': Rhetorical-question Continue '20': Self-talk '21': Rhetorical-Question '22': Yes-No-question '23': Open-Question '24': Rising Tone '25': Other Answers '26': Commit '27': Wh-Question '28': Repeat '29': Follow Me '30': Thanking '31': Offer '32': About-task '33': Reject-part '34': Affirmative Non-yes Answers '35': Apology '36': Downplayer '37': Humorous Material '38': Accept-part '39': Collaborative Completion '40': Mimic Other '41': Understanding Check '42': Misspeak Self-Correction '43': Or-Question '44': Topic Change '45': Negative Non-no Answers '46': Floor Grabber '47': Correct-misspeaking '48': Maybe '49': Acknowledge-answer '50': Defending/Explanation - name: idx dtype: int32 splits: - name: train num_bytes: 963913 num_examples: 14484 - name: validation num_bytes: 111813 num_examples: 1630 - name: test num_bytes: 419797 num_examples: 6459 download_size: 5330724 dataset_size: 1495523 - config_name: gum features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': preparation '1': evaluation '2': circumstance '3': solutionhood '4': justify '5': result '6': evidence '7': purpose '8': concession '9': elaboration '10': background '11': condition '12': cause '13': restatement '14': motivation '15': antithesis '16': no_relation - name: idx dtype: int32 splits: - name: train num_bytes: 270401 num_examples: 1700 - name: validation num_bytes: 35405 num_examples: 259 - name: test num_bytes: 40334 num_examples: 248 download_size: 5330724 dataset_size: 346140 - config_name: emergent features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': observing '1': for '2': against - name: idx dtype: int32 splits: - name: train num_bytes: 313257 num_examples: 2076 - name: validation num_bytes: 38948 num_examples: 259 - name: test num_bytes: 38842 num_examples: 259 download_size: 5330724 dataset_size: 391047 - config_name: persuasiveness-relevance features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 153158 num_examples: 725 - name: validation num_bytes: 19663 num_examples: 91 - name: test num_bytes: 18880 num_examples: 90 download_size: 5330724 dataset_size: 191701 - config_name: persuasiveness-specificity features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 106594 num_examples: 504 - name: validation num_bytes: 13766 num_examples: 62 - name: test num_bytes: 12712 num_examples: 62 download_size: 5330724 dataset_size: 133072 - config_name: persuasiveness-strength features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 79679 num_examples: 371 - name: validation num_bytes: 10052 num_examples: 46 - name: test num_bytes: 10225 num_examples: 46 download_size: 5330724 dataset_size: 99956 - config_name: emobank-dominance features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 660303 num_examples: 6392 - name: validation num_bytes: 86802 num_examples: 798 - name: test num_bytes: 83319 num_examples: 798 download_size: 5330724 dataset_size: 830424 - config_name: squinky-implicature features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 471552 num_examples: 3724 - name: validation num_bytes: 58087 num_examples: 465 - name: test num_bytes: 56549 num_examples: 465 download_size: 5330724 dataset_size: 586188 - config_name: sarcasm features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': notsarc '1': sarc - name: idx dtype: int32 splits: - name: train num_bytes: 2177332 num_examples: 3754 - name: validation num_bytes: 257834 num_examples: 469 - name: test num_bytes: 269724 num_examples: 469 download_size: 5330724 dataset_size: 2704890 - config_name: squinky-formality features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 459721 num_examples: 3622 - name: validation num_bytes: 59921 num_examples: 453 - name: test num_bytes: 58242 num_examples: 452 download_size: 5330724 dataset_size: 577884 - config_name: stac features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': Comment '1': Contrast '2': Q_Elab '3': Parallel '4': Explanation '5': Narration '6': Continuation '7': Result '8': Acknowledgement '9': Alternation '10': Question_answer_pair '11': Correction '12': Clarification_question '13': Conditional '14': Sequence '15': Elaboration '16': Background '17': no_relation - name: idx dtype: int32 splits: - name: train num_bytes: 645969 num_examples: 11230 - name: validation num_bytes: 71400 num_examples: 1247 - name: test num_bytes: 70451 num_examples: 1304 download_size: 5330724 dataset_size: 787820 - config_name: pdtb features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': Synchrony '1': Contrast '2': Asynchronous '3': Conjunction '4': List '5': Condition '6': Pragmatic concession '7': Restatement '8': Pragmatic cause '9': Alternative '10': Pragmatic condition '11': Pragmatic contrast '12': Instantiation '13': Exception '14': Cause '15': Concession - name: idx dtype: int32 splits: - name: train num_bytes: 2968638 num_examples: 12907 - name: validation num_bytes: 276997 num_examples: 1204 - name: test num_bytes: 235851 num_examples: 1085 download_size: 5330724 dataset_size: 3481486 - config_name: persuasiveness-premisetype features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': testimony '1': warrant '2': invented_instance '3': common_knowledge '4': statistics '5': analogy '6': definition '7': real_example - name: idx dtype: int32 splits: - name: train num_bytes: 122631 num_examples: 566 - name: validation num_bytes: 15920 num_examples: 71 - name: test num_bytes: 14395 num_examples: 70 download_size: 5330724 dataset_size: 152946 - config_name: squinky-informativeness features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 464855 num_examples: 3719 - name: validation num_bytes: 60447 num_examples: 465 - name: test num_bytes: 56872 num_examples: 464 download_size: 5330724 dataset_size: 582174 - config_name: persuasiveness-claimtype features: - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': Value '1': Fact '2': Policy - name: idx dtype: int32 splits: - name: train num_bytes: 31259 num_examples: 160 - name: validation num_bytes: 3803 num_examples: 20 - name: test num_bytes: 3717 num_examples: 19 download_size: 5330724 dataset_size: 38779 - config_name: emobank-valence features: - name: sentence dtype: string - name: label dtype: class_label: names: '0': low '1': high - name: idx dtype: int32 splits: - name: train num_bytes: 539652 num_examples: 5150 - name: validation num_bytes: 62809 num_examples: 644 - name: test num_bytes: 66178 num_examples: 643 download_size: 5330724 dataset_size: 668639 config_names: - emergent - emobank-arousal - emobank-dominance - emobank-valence - gum - mrda - pdtb - persuasiveness-claimtype - persuasiveness-eloquence - persuasiveness-premisetype - persuasiveness-relevance - persuasiveness-specificity - persuasiveness-strength - sarcasm - squinky-formality - squinky-implicature - squinky-informativeness - stac - switchboard - verifiability --- # Dataset Card for pragmeval ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@sileod](https://github.com/sileod) for adding this dataset.
提供机构:
sileod
原始信息汇总

数据集概述

数据集基本信息

语言

  • 主要语言:英语(en)

许可证

  • 许可证类型:未知

多语言性

  • 数据集为单语种(monolingual)

大小分类

  • 数据集大小包括:
    • 小于1K
    • 1K到10K
    • 10K到100K

数据源

  • 数据源类型:原始数据(original)

任务类别

  • 主要任务:文本分类(text-classification)

任务ID

  • 具体任务:多类分类(multi-class-classification)

数据集别名

  • 数据集别名:pragmeval

数据集详细配置

配置名称:verifiability

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括体验性(experiential)、不可验证(unverifiable)、非体验性(non-experiential)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):5712个样本,592520字节
    • 验证集(validation):634个样本,65215字节
    • 测试集(test):2424个样本,251799字节
  • 下载大小:5330724字节
  • 数据集大小:909534字节

配置名称:emobank-arousal

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):5470个样本,567660字节
    • 验证集(validation):684个样本,71221字节
    • 测试集(test):683个样本,69276字节
  • 下载大小:5330724字节
  • 数据集大小:708157字节

配置名称:switchboard

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括多种对话行为类型
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):18930个样本,1021220字节
    • 验证集(validation):2113个样本,116058字节
    • 测试集(test):649个样本,34013字节
  • 下载大小:5330724字节
  • 数据集大小:1171291字节

配置名称:persuasiveness-eloquence

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):725个样本,153946字节
    • 验证集(validation):91个样本,19376字节
    • 测试集(test):90个样本,18379字节
  • 下载大小:5330724字节
  • 数据集大小:191701字节

配置名称:mrda

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括多种对话行为类型
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):14484个样本,963913字节
    • 验证集(validation):1630个样本,111813字节
    • 测试集(test):6459个样本,419797字节
  • 下载大小:5330724字节
  • 数据集大小:1495523字节

配置名称:gum

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括多种语义关系类型
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):1700个样本,270401字节
    • 验证集(validation):259个样本,35405字节
    • 测试集(test):248个样本,40334字节
  • 下载大小:5330724字节
  • 数据集大小:346140字节

配置名称:emergent

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括观察(observing)、支持(for)、反对(against)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):2076个样本,313257字节
    • 验证集(validation):259个样本,38948字节
    • 测试集(test):259个样本,38842字节
  • 下载大小:5330724字节
  • 数据集大小:391047字节

配置名称:persuasiveness-relevance

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):725个样本,153158字节
    • 验证集(validation):91个样本,19663字节
    • 测试集(test):90个样本,18880字节
  • 下载大小:5330724字节
  • 数据集大小:191701字节

配置名称:persuasiveness-specificity

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):504个样本,106594字节
    • 验证集(validation):62个样本,13766字节
    • 测试集(test):62个样本,12712字节
  • 下载大小:5330724字节
  • 数据集大小:133072字节

配置名称:persuasiveness-strength

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):371个样本,79679字节
    • 验证集(validation):46个样本,10052字节
    • 测试集(test):46个样本,10225字节
  • 下载大小:5330724字节
  • 数据集大小:99956字节

配置名称:emobank-dominance

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):6392个样本,660303字节
    • 验证集(validation):798个样本,86802字节
    • 测试集(test):798个样本,83319字节
  • 下载大小:5330724字节
  • 数据集大小:830424字节

配置名称:squinky-implicature

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):3724个样本,471552字节
    • 验证集(validation):465个样本,58087字节
    • 测试集(test):465个样本,56549字节
  • 下载大小:5330724字节
  • 数据集大小:586188字节

配置名称:sarcasm

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括非讽刺(notsarc)、讽刺(sarc)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):3754个样本,2177332字节
    • 验证集(validation):469个样本,257834字节
    • 测试集(test):469个样本,269724字节
  • 下载大小:5330724字节
  • 数据集大小:2704890字节

配置名称:squinky-formality

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):3622个样本,459721字节
    • 验证集(validation):453个样本,59921字节
    • 测试集(test):452个样本,58242字节
  • 下载大小:5330724字节
  • 数据集大小:577884字节

配置名称:stac

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括多种对话关系类型
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):11230个样本,645969字节
    • 验证集(validation):1247个样本,71400字节
    • 测试集(test):1304个样本,70451字节
  • 下载大小:5330724字节
  • 数据集大小:787820字节

配置名称:pdtb

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括多种语义关系类型
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):12907个样本,2968638字节
    • 验证集(validation):1204个样本,276997字节
    • 测试集(test):1085个样本,235851字节
  • 下载大小:5330724字节
  • 数据集大小:3481486字节

配置名称:persuasiveness-premisetype

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括证言(testimony)、理由(warrant)、虚构实例(invented_instance)、常识(common_knowledge)、统计数据(statistics)、类比(analogy)、定义(definition)、真实例子(real_example)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):566个样本,122631字节
    • 验证集(validation):71个样本,15920字节
    • 测试集(test):70个样本,14395字节
  • 下载大小:5330724字节
  • 数据集大小:152946字节

配置名称:squinky-informativeness

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):3719个样本,464855字节
    • 验证集(validation):465个样本,60447字节
    • 测试集(test):464个样本,56872字节
  • 下载大小:5330724字节
  • 数据集大小:582174字节

配置名称:persuasiveness-claimtype

  • 特征
    • 句子1(sentence1):字符串类型
    • 句子2(sentence2):字符串类型
    • 标签(label):分类标签,包括价值(Value)、事实(Fact)、政策(Policy)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):160个样本,31259字节
    • 验证集(validation):20个样本,3803字节
    • 测试集(test):19个样本,3717字节
  • 下载大小:5330724字节
  • 数据集大小:38779字节

配置名称:emobank-valence

  • 特征
    • 句子(sentence):字符串类型
    • 标签(label):分类标签,包括低(low)、高(high)
    • 索引(idx):整数类型
  • 数据分割
    • 训练集(train):5150个样本,539652字节
    • 验证集(validation):644个样本,62809字节
    • 测试集(test):643个样本,66178字节
  • 下载大小:5330724字节
  • 数据集大小:668639字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作