sileod/pragmeval
收藏Hugging Face2024-01-18 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/sileod/pragmeval
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- found
language_creators:
- found
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 10K<n<100K
- 1K<n<10K
- n<1K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- multi-class-classification
pretty_name: pragmeval
dataset_info:
- config_name: verifiability
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': experiential
'1': unverifiable
'2': non-experiential
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 592520
num_examples: 5712
- name: validation
num_bytes: 65215
num_examples: 634
- name: test
num_bytes: 251799
num_examples: 2424
download_size: 5330724
dataset_size: 909534
- config_name: emobank-arousal
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 567660
num_examples: 5470
- name: validation
num_bytes: 71221
num_examples: 684
- name: test
num_bytes: 69276
num_examples: 683
download_size: 5330724
dataset_size: 708157
- config_name: switchboard
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': Response Acknowledgement
'1': Uninterpretable
'2': Or-Clause
'3': Reject
'4': Statement-non-opinion
'5': 3rd-party-talk
'6': Repeat-phrase
'7': Hold Before Answer/Agreement
'8': Signal-non-understanding
'9': Offers, Options Commits
'10': Agree/Accept
'11': Dispreferred Answers
'12': Hedge
'13': Action-directive
'14': Tag-Question
'15': Self-talk
'16': Yes-No-Question
'17': Rhetorical-Question
'18': No Answers
'19': Open-Question
'20': Conventional-closing
'21': Other Answers
'22': Acknowledge (Backchannel)
'23': Wh-Question
'24': Declarative Wh-Question
'25': Thanking
'26': Yes Answers
'27': Affirmative Non-yes Answers
'28': Declarative Yes-No-Question
'29': Backchannel in Question Form
'30': Apology
'31': Downplayer
'32': Conventional-opening
'33': Collaborative Completion
'34': Summarize/Reformulate
'35': Negative Non-no Answers
'36': Statement-opinion
'37': Appreciation
'38': Other
'39': Quotation
'40': Maybe/Accept-part
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 1021220
num_examples: 18930
- name: validation
num_bytes: 116058
num_examples: 2113
- name: test
num_bytes: 34013
num_examples: 649
download_size: 5330724
dataset_size: 1171291
- config_name: persuasiveness-eloquence
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 153946
num_examples: 725
- name: validation
num_bytes: 19376
num_examples: 91
- name: test
num_bytes: 18379
num_examples: 90
download_size: 5330724
dataset_size: 191701
- config_name: mrda
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': Declarative-Question
'1': Statement
'2': Reject
'3': Or-Clause
'4': 3rd-party-talk
'5': Continuer
'6': Hold Before Answer/Agreement
'7': Assessment/Appreciation
'8': Signal-non-understanding
'9': Floor Holder
'10': Sympathy
'11': Dispreferred Answers
'12': Reformulate/Summarize
'13': Exclamation
'14': Interrupted/Abandoned/Uninterpretable
'15': Expansions of y/n Answers
'16': Action-directive
'17': Tag-Question
'18': Accept
'19': Rhetorical-question Continue
'20': Self-talk
'21': Rhetorical-Question
'22': Yes-No-question
'23': Open-Question
'24': Rising Tone
'25': Other Answers
'26': Commit
'27': Wh-Question
'28': Repeat
'29': Follow Me
'30': Thanking
'31': Offer
'32': About-task
'33': Reject-part
'34': Affirmative Non-yes Answers
'35': Apology
'36': Downplayer
'37': Humorous Material
'38': Accept-part
'39': Collaborative Completion
'40': Mimic Other
'41': Understanding Check
'42': Misspeak Self-Correction
'43': Or-Question
'44': Topic Change
'45': Negative Non-no Answers
'46': Floor Grabber
'47': Correct-misspeaking
'48': Maybe
'49': Acknowledge-answer
'50': Defending/Explanation
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 963913
num_examples: 14484
- name: validation
num_bytes: 111813
num_examples: 1630
- name: test
num_bytes: 419797
num_examples: 6459
download_size: 5330724
dataset_size: 1495523
- config_name: gum
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': preparation
'1': evaluation
'2': circumstance
'3': solutionhood
'4': justify
'5': result
'6': evidence
'7': purpose
'8': concession
'9': elaboration
'10': background
'11': condition
'12': cause
'13': restatement
'14': motivation
'15': antithesis
'16': no_relation
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 270401
num_examples: 1700
- name: validation
num_bytes: 35405
num_examples: 259
- name: test
num_bytes: 40334
num_examples: 248
download_size: 5330724
dataset_size: 346140
- config_name: emergent
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': observing
'1': for
'2': against
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 313257
num_examples: 2076
- name: validation
num_bytes: 38948
num_examples: 259
- name: test
num_bytes: 38842
num_examples: 259
download_size: 5330724
dataset_size: 391047
- config_name: persuasiveness-relevance
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 153158
num_examples: 725
- name: validation
num_bytes: 19663
num_examples: 91
- name: test
num_bytes: 18880
num_examples: 90
download_size: 5330724
dataset_size: 191701
- config_name: persuasiveness-specificity
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 106594
num_examples: 504
- name: validation
num_bytes: 13766
num_examples: 62
- name: test
num_bytes: 12712
num_examples: 62
download_size: 5330724
dataset_size: 133072
- config_name: persuasiveness-strength
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 79679
num_examples: 371
- name: validation
num_bytes: 10052
num_examples: 46
- name: test
num_bytes: 10225
num_examples: 46
download_size: 5330724
dataset_size: 99956
- config_name: emobank-dominance
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 660303
num_examples: 6392
- name: validation
num_bytes: 86802
num_examples: 798
- name: test
num_bytes: 83319
num_examples: 798
download_size: 5330724
dataset_size: 830424
- config_name: squinky-implicature
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 471552
num_examples: 3724
- name: validation
num_bytes: 58087
num_examples: 465
- name: test
num_bytes: 56549
num_examples: 465
download_size: 5330724
dataset_size: 586188
- config_name: sarcasm
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': notsarc
'1': sarc
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 2177332
num_examples: 3754
- name: validation
num_bytes: 257834
num_examples: 469
- name: test
num_bytes: 269724
num_examples: 469
download_size: 5330724
dataset_size: 2704890
- config_name: squinky-formality
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 459721
num_examples: 3622
- name: validation
num_bytes: 59921
num_examples: 453
- name: test
num_bytes: 58242
num_examples: 452
download_size: 5330724
dataset_size: 577884
- config_name: stac
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': Comment
'1': Contrast
'2': Q_Elab
'3': Parallel
'4': Explanation
'5': Narration
'6': Continuation
'7': Result
'8': Acknowledgement
'9': Alternation
'10': Question_answer_pair
'11': Correction
'12': Clarification_question
'13': Conditional
'14': Sequence
'15': Elaboration
'16': Background
'17': no_relation
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 645969
num_examples: 11230
- name: validation
num_bytes: 71400
num_examples: 1247
- name: test
num_bytes: 70451
num_examples: 1304
download_size: 5330724
dataset_size: 787820
- config_name: pdtb
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': Synchrony
'1': Contrast
'2': Asynchronous
'3': Conjunction
'4': List
'5': Condition
'6': Pragmatic concession
'7': Restatement
'8': Pragmatic cause
'9': Alternative
'10': Pragmatic condition
'11': Pragmatic contrast
'12': Instantiation
'13': Exception
'14': Cause
'15': Concession
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 2968638
num_examples: 12907
- name: validation
num_bytes: 276997
num_examples: 1204
- name: test
num_bytes: 235851
num_examples: 1085
download_size: 5330724
dataset_size: 3481486
- config_name: persuasiveness-premisetype
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': testimony
'1': warrant
'2': invented_instance
'3': common_knowledge
'4': statistics
'5': analogy
'6': definition
'7': real_example
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 122631
num_examples: 566
- name: validation
num_bytes: 15920
num_examples: 71
- name: test
num_bytes: 14395
num_examples: 70
download_size: 5330724
dataset_size: 152946
- config_name: squinky-informativeness
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 464855
num_examples: 3719
- name: validation
num_bytes: 60447
num_examples: 465
- name: test
num_bytes: 56872
num_examples: 464
download_size: 5330724
dataset_size: 582174
- config_name: persuasiveness-claimtype
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: label
dtype:
class_label:
names:
'0': Value
'1': Fact
'2': Policy
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 31259
num_examples: 160
- name: validation
num_bytes: 3803
num_examples: 20
- name: test
num_bytes: 3717
num_examples: 19
download_size: 5330724
dataset_size: 38779
- config_name: emobank-valence
features:
- name: sentence
dtype: string
- name: label
dtype:
class_label:
names:
'0': low
'1': high
- name: idx
dtype: int32
splits:
- name: train
num_bytes: 539652
num_examples: 5150
- name: validation
num_bytes: 62809
num_examples: 644
- name: test
num_bytes: 66178
num_examples: 643
download_size: 5330724
dataset_size: 668639
config_names:
- emergent
- emobank-arousal
- emobank-dominance
- emobank-valence
- gum
- mrda
- pdtb
- persuasiveness-claimtype
- persuasiveness-eloquence
- persuasiveness-premisetype
- persuasiveness-relevance
- persuasiveness-specificity
- persuasiveness-strength
- sarcasm
- squinky-formality
- squinky-implicature
- squinky-informativeness
- stac
- switchboard
- verifiability
---
# Dataset Card for pragmeval
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@sileod](https://github.com/sileod) for adding this dataset.
提供机构:
sileod
原始信息汇总
数据集概述
数据集基本信息
语言
- 主要语言:英语(en)
许可证
- 许可证类型:未知
多语言性
- 数据集为单语种(monolingual)
大小分类
- 数据集大小包括:
- 小于1K
- 1K到10K
- 10K到100K
数据源
- 数据源类型:原始数据(original)
任务类别
- 主要任务:文本分类(text-classification)
任务ID
- 具体任务:多类分类(multi-class-classification)
数据集别名
- 数据集别名:pragmeval
数据集详细配置
配置名称:verifiability
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括体验性(experiential)、不可验证(unverifiable)、非体验性(non-experiential)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):5712个样本,592520字节
- 验证集(validation):634个样本,65215字节
- 测试集(test):2424个样本,251799字节
- 下载大小:5330724字节
- 数据集大小:909534字节
配置名称:emobank-arousal
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):5470个样本,567660字节
- 验证集(validation):684个样本,71221字节
- 测试集(test):683个样本,69276字节
- 下载大小:5330724字节
- 数据集大小:708157字节
配置名称:switchboard
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括多种对话行为类型
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):18930个样本,1021220字节
- 验证集(validation):2113个样本,116058字节
- 测试集(test):649个样本,34013字节
- 下载大小:5330724字节
- 数据集大小:1171291字节
配置名称:persuasiveness-eloquence
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):725个样本,153946字节
- 验证集(validation):91个样本,19376字节
- 测试集(test):90个样本,18379字节
- 下载大小:5330724字节
- 数据集大小:191701字节
配置名称:mrda
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括多种对话行为类型
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):14484个样本,963913字节
- 验证集(validation):1630个样本,111813字节
- 测试集(test):6459个样本,419797字节
- 下载大小:5330724字节
- 数据集大小:1495523字节
配置名称:gum
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括多种语义关系类型
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):1700个样本,270401字节
- 验证集(validation):259个样本,35405字节
- 测试集(test):248个样本,40334字节
- 下载大小:5330724字节
- 数据集大小:346140字节
配置名称:emergent
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括观察(observing)、支持(for)、反对(against)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):2076个样本,313257字节
- 验证集(validation):259个样本,38948字节
- 测试集(test):259个样本,38842字节
- 下载大小:5330724字节
- 数据集大小:391047字节
配置名称:persuasiveness-relevance
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):725个样本,153158字节
- 验证集(validation):91个样本,19663字节
- 测试集(test):90个样本,18880字节
- 下载大小:5330724字节
- 数据集大小:191701字节
配置名称:persuasiveness-specificity
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):504个样本,106594字节
- 验证集(validation):62个样本,13766字节
- 测试集(test):62个样本,12712字节
- 下载大小:5330724字节
- 数据集大小:133072字节
配置名称:persuasiveness-strength
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):371个样本,79679字节
- 验证集(validation):46个样本,10052字节
- 测试集(test):46个样本,10225字节
- 下载大小:5330724字节
- 数据集大小:99956字节
配置名称:emobank-dominance
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):6392个样本,660303字节
- 验证集(validation):798个样本,86802字节
- 测试集(test):798个样本,83319字节
- 下载大小:5330724字节
- 数据集大小:830424字节
配置名称:squinky-implicature
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):3724个样本,471552字节
- 验证集(validation):465个样本,58087字节
- 测试集(test):465个样本,56549字节
- 下载大小:5330724字节
- 数据集大小:586188字节
配置名称:sarcasm
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括非讽刺(notsarc)、讽刺(sarc)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):3754个样本,2177332字节
- 验证集(validation):469个样本,257834字节
- 测试集(test):469个样本,269724字节
- 下载大小:5330724字节
- 数据集大小:2704890字节
配置名称:squinky-formality
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):3622个样本,459721字节
- 验证集(validation):453个样本,59921字节
- 测试集(test):452个样本,58242字节
- 下载大小:5330724字节
- 数据集大小:577884字节
配置名称:stac
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括多种对话关系类型
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):11230个样本,645969字节
- 验证集(validation):1247个样本,71400字节
- 测试集(test):1304个样本,70451字节
- 下载大小:5330724字节
- 数据集大小:787820字节
配置名称:pdtb
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括多种语义关系类型
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):12907个样本,2968638字节
- 验证集(validation):1204个样本,276997字节
- 测试集(test):1085个样本,235851字节
- 下载大小:5330724字节
- 数据集大小:3481486字节
配置名称:persuasiveness-premisetype
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括证言(testimony)、理由(warrant)、虚构实例(invented_instance)、常识(common_knowledge)、统计数据(statistics)、类比(analogy)、定义(definition)、真实例子(real_example)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):566个样本,122631字节
- 验证集(validation):71个样本,15920字节
- 测试集(test):70个样本,14395字节
- 下载大小:5330724字节
- 数据集大小:152946字节
配置名称:squinky-informativeness
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):3719个样本,464855字节
- 验证集(validation):465个样本,60447字节
- 测试集(test):464个样本,56872字节
- 下载大小:5330724字节
- 数据集大小:582174字节
配置名称:persuasiveness-claimtype
- 特征:
- 句子1(sentence1):字符串类型
- 句子2(sentence2):字符串类型
- 标签(label):分类标签,包括价值(Value)、事实(Fact)、政策(Policy)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):160个样本,31259字节
- 验证集(validation):20个样本,3803字节
- 测试集(test):19个样本,3717字节
- 下载大小:5330724字节
- 数据集大小:38779字节
配置名称:emobank-valence
- 特征:
- 句子(sentence):字符串类型
- 标签(label):分类标签,包括低(low)、高(high)
- 索引(idx):整数类型
- 数据分割:
- 训练集(train):5150个样本,539652字节
- 验证集(validation):644个样本,62809字节
- 测试集(test):643个样本,66178字节
- 下载大小:5330724字节
- 数据集大小:668639字节



