five

PierreLepagnol/WRENCH

收藏
Hugging Face2023-08-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PierreLepagnol/WRENCH
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification - token-classification size_categories: - 10K<n<100K dataset_info: - config_name: yelp features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 22618599 num_examples: 30400 - name: validation num_bytes: 2824249 num_examples: 3800 - name: test num_bytes: 2709033 num_examples: 3800 download_size: 37356054 dataset_size: 28151881 - config_name: imdb features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 25515241 num_examples: 20000 - name: validation num_bytes: 3269130 num_examples: 2500 - name: test num_bytes: 3151954 num_examples: 2500 download_size: 33910706 dataset_size: 31936325 - config_name: agnews features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 20357121 num_examples: 96000 - name: validation num_bytes: 2487983 num_examples: 12000 - name: test num_bytes: 2521518 num_examples: 12000 download_size: 39149014 dataset_size: 25366622 - config_name: cdr features: - name: text dtype: string - name: label dtype: int8 - name: entity1 dtype: string - name: entity2 dtype: string - name: weak_labels sequence: int8 splits: - name: train num_bytes: 2318545 num_examples: 8430 - name: validation num_bytes: 246252 num_examples: 920 - name: test num_bytes: 1229627 num_examples: 4673 download_size: 11036213 dataset_size: 3794424 - config_name: chemprot features: - name: text dtype: string - name: label dtype: int8 - name: entity1 dtype: string - name: entity2 dtype: string - name: weak_labels sequence: int8 splits: - name: train num_bytes: 3474637 num_examples: 12861 - name: validation num_bytes: 435850 num_examples: 1607 - name: test num_bytes: 434031 num_examples: 1607 download_size: 15743249 dataset_size: 4344518 - config_name: semeval features: - name: text dtype: string - name: label dtype: int8 - name: entity1 dtype: string - name: entity2 dtype: string - name: weak_labels sequence: int8 splits: - name: train num_bytes: 532785 num_examples: 1749 - name: validation num_bytes: 54373 num_examples: 178 - name: test num_bytes: 184826 num_examples: 600 download_size: 2295058 dataset_size: 771984 - config_name: sms features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 741520 num_examples: 4571 - name: validation num_bytes: 81747 num_examples: 500 - name: test num_bytes: 80152 num_examples: 500 download_size: 6715435 dataset_size: 903419 - config_name: spouse features: - name: text dtype: string - name: label dtype: int8 - name: entity1 dtype: string - name: entity2 dtype: string - name: weak_labels sequence: int8 splits: - name: train num_bytes: 7550613 num_examples: 22254 - name: validation num_bytes: 952523 num_examples: 2811 - name: test num_bytes: 876804 num_examples: 2701 download_size: 22017644 dataset_size: 9379940 - config_name: trec features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 610244 num_examples: 4965 - name: validation num_bytes: 61048 num_examples: 500 - name: test num_bytes: 56479 num_examples: 500 download_size: 2277067 dataset_size: 727771 - config_name: youtube features: - name: text dtype: string - name: label dtype: int8 - name: weak_labels sequence: int8 splits: - name: train num_bytes: 180736 num_examples: 1586 - name: validation num_bytes: 14659 num_examples: 120 - name: test num_bytes: 33347 num_examples: 250 download_size: 759494 dataset_size: 228742 ---
提供机构:
PierreLepagnol
原始信息汇总

数据集概述

数据集配置及特征

配置名称 特征名称 数据类型
yelp text string
yelp label int8
yelp weak_labels sequence: int8
imdb text string
imdb label int8
imdb weak_labels sequence: int8
agnews text string
agnews label int8
agnews weak_labels sequence: int8
cdr text string
cdr label int8
cdr entity1 string
cdr entity2 string
cdr weak_labels sequence: int8
chemprot text string
chemprot label int8
chemprot entity1 string
chemprot entity2 string
chemprot weak_labels sequence: int8
semeval text string
semeval label int8
semeval entity1 string
semeval entity2 string
semeval weak_labels sequence: int8
sms text string
sms label int8
sms weak_labels sequence: int8
spouse text string
spouse label int8
spouse entity1 string
spouse entity2 string
spouse weak_labels sequence: int8
trec text string
trec label int8
trec weak_labels sequence: int8
youtube text string
youtube label int8
youtube weak_labels sequence: int8

数据集大小及分割

配置名称 分割类型 示例数量 字节数
yelp train 30400 22618599
yelp validation 3800 2824249
yelp test 3800 2709033
imdb train 20000 25515241
imdb validation 2500 3269130
imdb test 2500 3151954
agnews train 96000 20357121
agnews validation 12000 2487983
agnews test 12000 2521518
cdr train 8430 2318545
cdr validation 920 246252
cdr test 4673 1229627
chemprot train 12861 3474637
chemprot validation 1607 435850
chemprot test 1607 434031
semeval train 1749 532785
semeval validation 178 54373
semeval test 600 184826
sms train 4571 741520
sms validation 500 81747
sms test 500 80152
spouse train 22254 7550613
spouse validation 2811 952523
spouse test 2701 876804
trec train 4965 610244
trec validation 500 61048
trec test 500 56479
youtube train 1586 180736
youtube validation 120 14659
youtube test 250 33347

下载与数据集大小

配置名称 下载大小 数据集大小
yelp 37356054 28151881
imdb 33910706 31936325
agnews 39149014 25366622
cdr 11036213 3794424
chemprot 15743249 4344518
semeval 2295058 771984
sms 6715435 903419
spouse 22017644 9379940
trec 2277067 727771
youtube 759494 228742
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作