PierreLepagnol/WRENCH
收藏Hugging Face2023-08-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PierreLepagnol/WRENCH
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-classification
- token-classification
size_categories:
- 10K<n<100K
dataset_info:
- config_name: yelp
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 22618599
num_examples: 30400
- name: validation
num_bytes: 2824249
num_examples: 3800
- name: test
num_bytes: 2709033
num_examples: 3800
download_size: 37356054
dataset_size: 28151881
- config_name: imdb
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 25515241
num_examples: 20000
- name: validation
num_bytes: 3269130
num_examples: 2500
- name: test
num_bytes: 3151954
num_examples: 2500
download_size: 33910706
dataset_size: 31936325
- config_name: agnews
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 20357121
num_examples: 96000
- name: validation
num_bytes: 2487983
num_examples: 12000
- name: test
num_bytes: 2521518
num_examples: 12000
download_size: 39149014
dataset_size: 25366622
- config_name: cdr
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: entity1
dtype: string
- name: entity2
dtype: string
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 2318545
num_examples: 8430
- name: validation
num_bytes: 246252
num_examples: 920
- name: test
num_bytes: 1229627
num_examples: 4673
download_size: 11036213
dataset_size: 3794424
- config_name: chemprot
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: entity1
dtype: string
- name: entity2
dtype: string
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 3474637
num_examples: 12861
- name: validation
num_bytes: 435850
num_examples: 1607
- name: test
num_bytes: 434031
num_examples: 1607
download_size: 15743249
dataset_size: 4344518
- config_name: semeval
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: entity1
dtype: string
- name: entity2
dtype: string
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 532785
num_examples: 1749
- name: validation
num_bytes: 54373
num_examples: 178
- name: test
num_bytes: 184826
num_examples: 600
download_size: 2295058
dataset_size: 771984
- config_name: sms
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 741520
num_examples: 4571
- name: validation
num_bytes: 81747
num_examples: 500
- name: test
num_bytes: 80152
num_examples: 500
download_size: 6715435
dataset_size: 903419
- config_name: spouse
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: entity1
dtype: string
- name: entity2
dtype: string
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 7550613
num_examples: 22254
- name: validation
num_bytes: 952523
num_examples: 2811
- name: test
num_bytes: 876804
num_examples: 2701
download_size: 22017644
dataset_size: 9379940
- config_name: trec
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 610244
num_examples: 4965
- name: validation
num_bytes: 61048
num_examples: 500
- name: test
num_bytes: 56479
num_examples: 500
download_size: 2277067
dataset_size: 727771
- config_name: youtube
features:
- name: text
dtype: string
- name: label
dtype: int8
- name: weak_labels
sequence: int8
splits:
- name: train
num_bytes: 180736
num_examples: 1586
- name: validation
num_bytes: 14659
num_examples: 120
- name: test
num_bytes: 33347
num_examples: 250
download_size: 759494
dataset_size: 228742
---
提供机构:
PierreLepagnol
原始信息汇总
数据集概述
数据集配置及特征
| 配置名称 | 特征名称 | 数据类型 |
|---|---|---|
| yelp | text | string |
| yelp | label | int8 |
| yelp | weak_labels | sequence: int8 |
| imdb | text | string |
| imdb | label | int8 |
| imdb | weak_labels | sequence: int8 |
| agnews | text | string |
| agnews | label | int8 |
| agnews | weak_labels | sequence: int8 |
| cdr | text | string |
| cdr | label | int8 |
| cdr | entity1 | string |
| cdr | entity2 | string |
| cdr | weak_labels | sequence: int8 |
| chemprot | text | string |
| chemprot | label | int8 |
| chemprot | entity1 | string |
| chemprot | entity2 | string |
| chemprot | weak_labels | sequence: int8 |
| semeval | text | string |
| semeval | label | int8 |
| semeval | entity1 | string |
| semeval | entity2 | string |
| semeval | weak_labels | sequence: int8 |
| sms | text | string |
| sms | label | int8 |
| sms | weak_labels | sequence: int8 |
| spouse | text | string |
| spouse | label | int8 |
| spouse | entity1 | string |
| spouse | entity2 | string |
| spouse | weak_labels | sequence: int8 |
| trec | text | string |
| trec | label | int8 |
| trec | weak_labels | sequence: int8 |
| youtube | text | string |
| youtube | label | int8 |
| youtube | weak_labels | sequence: int8 |
数据集大小及分割
| 配置名称 | 分割类型 | 示例数量 | 字节数 |
|---|---|---|---|
| yelp | train | 30400 | 22618599 |
| yelp | validation | 3800 | 2824249 |
| yelp | test | 3800 | 2709033 |
| imdb | train | 20000 | 25515241 |
| imdb | validation | 2500 | 3269130 |
| imdb | test | 2500 | 3151954 |
| agnews | train | 96000 | 20357121 |
| agnews | validation | 12000 | 2487983 |
| agnews | test | 12000 | 2521518 |
| cdr | train | 8430 | 2318545 |
| cdr | validation | 920 | 246252 |
| cdr | test | 4673 | 1229627 |
| chemprot | train | 12861 | 3474637 |
| chemprot | validation | 1607 | 435850 |
| chemprot | test | 1607 | 434031 |
| semeval | train | 1749 | 532785 |
| semeval | validation | 178 | 54373 |
| semeval | test | 600 | 184826 |
| sms | train | 4571 | 741520 |
| sms | validation | 500 | 81747 |
| sms | test | 500 | 80152 |
| spouse | train | 22254 | 7550613 |
| spouse | validation | 2811 | 952523 |
| spouse | test | 2701 | 876804 |
| trec | train | 4965 | 610244 |
| trec | validation | 500 | 61048 |
| trec | test | 500 | 56479 |
| youtube | train | 1586 | 180736 |
| youtube | validation | 120 | 14659 |
| youtube | test | 250 | 33347 |
下载与数据集大小
| 配置名称 | 下载大小 | 数据集大小 |
|---|---|---|
| yelp | 37356054 | 28151881 |
| imdb | 33910706 | 31936325 |
| agnews | 39149014 | 25366622 |
| cdr | 11036213 | 3794424 |
| chemprot | 15743249 | 4344518 |
| semeval | 2295058 | 771984 |
| sms | 6715435 | 903419 |
| spouse | 22017644 | 9379940 |
| trec | 2277067 | 727771 |
| youtube | 759494 | 228742 |



