MLP-Lemma/SFT-datasets
收藏Hugging Face2024-05-08 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/MLP-Lemma/SFT-datasets
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: BoolQ
features:
- name: instruction
dtype: string
- name: context
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 7096485
num_examples: 9427
- name: validation
num_bytes: 2435718
num_examples: 3270
- name: test
num_bytes: 2425153
num_examples: 3245
download_size: 6587773
dataset_size: 11957356
- config_name: CosmosQA
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 10684756
num_examples: 25262
- name: test
num_bytes: 3150505
num_examples: 6963
- name: validation
num_bytes: 1348959
num_examples: 2985
download_size: 6906818
dataset_size: 15184220
- config_name: DROP
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 100026971
num_examples: 77400
- name: validation
num_bytes: 10893643
num_examples: 9535
download_size: 8091770
dataset_size: 110920614
- config_name: HotpotQA
features:
- name: instruction
dtype: string
- name: output
dtype: string
- name: context
sequence: string
splits:
- name: train
num_bytes: 534870557
num_examples: 90447
- name: validation
num_bytes: 44228449
num_examples: 7405
download_size: 338345809
dataset_size: 579099006
- config_name: LongAlpaca
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 404591757.0886667
num_examples: 9956
download_size: 262150939
dataset_size: 404591757.0886667
- config_name: MultiNews
features:
- name: context
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 560860212
num_examples: 44972
download_size: 323585169
dataset_size: 560860212
- config_name: MultiRC
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 20144535.745879676
num_examples: 12025
- name: validation
num_bytes: 3277067.0173267326
num_examples: 2075
- name: test
num_bytes: 0.0
num_examples: 0
download_size: 1276036
dataset_size: 23421602.763206407
- config_name: NewsQA
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 220213756
num_examples: 71559
download_size: 23399407
dataset_size: 220213756
- config_name: QMSum
features:
- name: context
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 64407583
num_examples: 1257
download_size: 4090877
dataset_size: 64407583
- config_name: ReCoRD
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 130895292
num_examples: 100730
- name: validation
num_bytes: 12856297
num_examples: 10000
- name: test
num_bytes: 12744346
num_examples: 10000
download_size: 65507164
dataset_size: 156495935
- config_name: SQuAD
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 74256565
num_examples: 87599
- name: validation
num_bytes: 9710151
num_examples: 10570
download_size: 15027476
dataset_size: 83966716
- config_name: TriviaQA
features:
- name: instruction
dtype: string
- name: context
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 3294958826
num_examples: 61888
- name: validation
num_bytes: 424127867
num_examples: 7993
- name: test
num_bytes: 404679646
num_examples: 7701
download_size: 2398418943
dataset_size: 4123766339
- config_name: booksum
features:
- name: output
dtype: string
- name: context
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 254202726
num_examples: 9600
- name: validation
num_bytes: 34235554
num_examples: 1484
- name: test
num_bytes: 37938179
num_examples: 1431
download_size: 168553619
dataset_size: 326376459
- config_name: cnn
features:
- name: context
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 1274021763
num_examples: 287113
- name: validation
num_bytes: 58305730
num_examples: 13368
- name: test
num_bytes: 50418397
num_examples: 11490
download_size: 824633386
dataset_size: 1382745890
- config_name: qasper
features:
- name: context
dtype: string
- name: instruction
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 26224641
num_examples: 1031
download_size: 7705854
dataset_size: 26224641
- config_name: xsum
features:
- name: context
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 494830320
num_examples: 204045
- name: validation
num_bytes: 27159343
num_examples: 11332
- name: test
num_bytes: 27624033
num_examples: 11334
download_size: 336535468
dataset_size: 549613696
configs:
- config_name: BoolQ
data_files:
- split: train
path: BoolQ/train-*
- split: validation
path: BoolQ/validation-*
- split: test
path: BoolQ/test-*
- config_name: CosmosQA
data_files:
- split: train
path: CosmosQA/train-*
- split: test
path: CosmosQA/test-*
- split: validation
path: CosmosQA/validation-*
- config_name: DROP
data_files:
- split: train
path: DROP/train-*
- split: validation
path: DROP/validation-*
- config_name: HotpotQA
data_files:
- split: train
path: HotpotQA/train-*
- split: validation
path: HotpotQA/validation-*
- config_name: LongAlpaca
data_files:
- split: train
path: LongAlpaca/train-*
- config_name: MultiNews
data_files:
- split: train
path: MultiNews/train-*
- config_name: MultiRC
data_files:
- split: train
path: MultiRC/train-*
- split: validation
path: MultiRC/validation-*
- split: test
path: MultiRC/test-*
- config_name: NewsQA
data_files:
- split: train
path: NewsQA/train-*
- config_name: QMSum
data_files:
- split: train
path: QMSum/train-*
- config_name: ReCoRD
data_files:
- split: train
path: ReCoRD/train-*
- split: validation
path: ReCoRD/validation-*
- split: test
path: ReCoRD/test-*
- config_name: SQuAD
data_files:
- split: train
path: SQuAD/train-*
- split: validation
path: SQuAD/validation-*
- config_name: TriviaQA
data_files:
- split: train
path: TriviaQA/train-*
- split: validation
path: TriviaQA/validation-*
- split: test
path: TriviaQA/test-*
- config_name: booksum
data_files:
- split: train
path: booksum/train-*
- split: validation
path: booksum/validation-*
- split: test
path: booksum/test-*
- config_name: cnn
data_files:
- split: train
path: cnn/train-*
- split: validation
path: cnn/validation-*
- split: test
path: cnn/test-*
- config_name: qasper
data_files:
- split: train
path: qasper/train-*
- config_name: xsum
data_files:
- split: train
path: xsum/train-*
- split: validation
path: xsum/validation-*
- split: test
path: xsum/test-*
---
The provided README content lists multiple datasets, each with specific configurations, features, and splits. Each dataset includes details such as the number of bytes, number of examples, and paths for different splits (train, validation, test). The features typically include instruction, context, and output, all of which are of string data type. The datasets vary in size and number of examples, indicating diversity in their applications and purposes.
提供机构:
MLP-Lemma
原始信息汇总
数据集概述
1. BoolQ
- 特征:
- instruction: string
- context: string
- output: string
- 分割:
- train: 9427 examples, 7096485 bytes
- validation: 3270 examples, 2435718 bytes
- test: 3245 examples, 2425153 bytes
- 下载大小: 6587773 bytes
- 数据集大小: 11957356 bytes
2. CosmosQA
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 25262 examples, 10684756 bytes
- test: 6963 examples, 3150505 bytes
- validation: 2985 examples, 1348959 bytes
- 下载大小: 6906818 bytes
- 数据集大小: 15184220 bytes
3. DROP
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 77400 examples, 100026971 bytes
- validation: 9535 examples, 10893643 bytes
- 下载大小: 8091770 bytes
- 数据集大小: 110920614 bytes
4. HotpotQA
- 特征:
- instruction: string
- output: string
- context: string
- 分割:
- train: 90447 examples, 534870557 bytes
- validation: 7405 examples, 44228449 bytes
- 下载大小: 338345809 bytes
- 数据集大小: 579099006 bytes
5. LongAlpaca
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 9956 examples, 404591757.0886667 bytes
- 下载大小: 262150939 bytes
- 数据集大小: 404591757.0886667 bytes
6. MultiNews
- 特征:
- context: string
- output: string
- instruction: string
- 分割:
- train: 44972 examples, 560860212 bytes
- 下载大小: 323585169 bytes
- 数据集大小: 560860212 bytes
7. MultiRC
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 12025 examples, 20144535.745879676 bytes
- validation: 2075 examples, 3277067.0173267326 bytes
- test: 0 examples, 0 bytes
- 下载大小: 1276036 bytes
- 数据集大小: 23421602.763206407 bytes
8. NewsQA
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 71559 examples, 220213756 bytes
- 下载大小: 23399407 bytes
- 数据集大小: 220213756 bytes
9. QMSum
- 特征:
- context: string
- output: string
- instruction: string
- 分割:
- train: 1257 examples, 64407583 bytes
- 下载大小: 4090877 bytes
- 数据集大小: 64407583 bytes
10. ReCoRD
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 100730 examples, 130895292 bytes
- validation: 10000 examples, 12856297 bytes
- test: 10000 examples, 12744346 bytes
- 下载大小: 65507164 bytes
- 数据集大小: 156495935 bytes
11. SQuAD
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 87599 examples, 74256565 bytes
- validation: 10570 examples, 9710151 bytes
- 下载大小: 15027476 bytes
- 数据集大小: 83966716 bytes
12. TriviaQA
- 特征:
- instruction: string
- context: string
- output: string
- 分割:
- train: 61888 examples, 3294958826 bytes
- validation: 7993 examples, 424127867 bytes
- test: 7701 examples, 404679646 bytes
- 下载大小: 2398418943 bytes
- 数据集大小: 4123766339 bytes
13. booksum
- 特征:
- output: string
- context: string
- instruction: string
- 分割:
- train: 9600 examples, 254202726 bytes
- validation: 1484 examples, 34235554 bytes
- test: 1431 examples, 37938179 bytes
- 下载大小: 168553619 bytes
- 数据集大小: 326376459 bytes
14. cnn
- 特征:
- context: string
- output: string
- instruction: string
- 分割:
- train: 287113 examples, 1274021763 bytes
- validation: 13368 examples, 58305730 bytes
- test: 11490 examples, 50418397 bytes
- 下载大小: 824633386 bytes
- 数据集大小: 1382745890 bytes
15. qasper
- 特征:
- context: string
- instruction: string
- output: string
- 分割:
- train: 1031 examples, 26224641 bytes
- 下载大小: 7705854 bytes
- 数据集大小: 26224641 bytes
16. xsum
- 特征:
- context: string
- output: string
- instruction: string
- 分割:
- train: 204045 examples, 494830320 bytes
- validation: 11332 examples, 27159343 bytes
- test: 11334 examples, 27624033 bytes
- 下载大小: 336535468 bytes
- 数据集大小: 549613696 bytes



