MLP-Lemma/Instruct-datasets-preprocessed-st
收藏Hugging Face2024-05-12 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/MLP-Lemma/Instruct-datasets-preprocessed-st
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: BigPatent
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 2162904748
num_examples: 41479
download_size: 431393280
dataset_size: 2162904748
- config_name: BookSum
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 560690900
num_examples: 9409
download_size: 132130269
dataset_size: 560690900
- config_name: BoolQ
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 17836256
num_examples: 9427
download_size: 3550975
dataset_size: 17836256
- config_name: CosmosQA
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 53658116
num_examples: 25262
download_size: 7853834
dataset_size: 53658116
- config_name: DROP
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 250955748
num_examples: 77204
download_size: 14175687
dataset_size: 250955748
- config_name: HotpotQA
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 1102360272
num_examples: 90208
download_size: 236367807
dataset_size: 1102360272
- config_name: LongAlpaca
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 743978012
num_examples: 7627
download_size: 149072591
dataset_size: 743978012
- config_name: MultiNews
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 1050726252
num_examples: 44351
download_size: 249001883
dataset_size: 1050726252
- config_name: MultiRC
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 43583068
num_examples: 12025
download_size: 2381273
dataset_size: 43583068
- config_name: NarrativeQA
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 7766436704
num_examples: 13344
download_size: 1653310065
dataset_size: 7766436704
- config_name: QMsum
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 149627860
num_examples: 1257
download_size: 22788582
dataset_size: 149627860
- config_name: Qasper
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 120323992
num_examples: 2545
download_size: 23399636
dataset_size: 120323992
- config_name: Quality
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 141480664
num_examples: 2523
download_size: 24146326
dataset_size: 141480664
- config_name: ReCoRD
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 294993520
num_examples: 100684
download_size: 59618870
dataset_size: 294993520
- config_name: SQuAD
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 188547064
num_examples: 87576
download_size: 31330364
dataset_size: 188547064
- config_name: TriviaQA
features:
- name: input_ids
sequence: int32
- name: input_sentences_ids
sequence:
sequence: int64
- name: labels
sequence: int64
splits:
- name: train
num_bytes: 5052603772
num_examples: 53503
download_size: 1048885926
dataset_size: 5052603772
configs:
- config_name: BigPatent
data_files:
- split: train
path: BigPatent/train-*
- config_name: BookSum
data_files:
- split: train
path: BookSum/train-*
- config_name: BoolQ
data_files:
- split: train
path: BoolQ/train-*
- config_name: CosmosQA
data_files:
- split: train
path: CosmosQA/train-*
- config_name: DROP
data_files:
- split: train
path: DROP/train-*
- config_name: HotpotQA
data_files:
- split: train
path: HotpotQA/train-*
- config_name: LongAlpaca
data_files:
- split: train
path: LongAlpaca/train-*
- config_name: MultiNews
data_files:
- split: train
path: MultiNews/train-*
- config_name: MultiRC
data_files:
- split: train
path: MultiRC/train-*
- config_name: NarrativeQA
data_files:
- split: train
path: NarrativeQA/train-*
- config_name: QMsum
data_files:
- split: train
path: QMsum/train-*
- config_name: Qasper
data_files:
- split: train
path: Qasper/train-*
- config_name: Quality
data_files:
- split: train
path: Quality/train-*
- config_name: ReCoRD
data_files:
- split: train
path: ReCoRD/train-*
- config_name: SQuAD
data_files:
- split: train
path: SQuAD/train-*
- config_name: TriviaQA
data_files:
- split: train
path: TriviaQA/train-*
---
This dataset contains multiple sub-datasets, each with a specific configuration name, features, file size and number of examples for the training set, download size, and total dataset size. Features include input IDs, input sentences IDs, and labels, with their types and structures also detailed. The configuration information and data file paths are also listed.
提供机构:
MLP-Lemma
原始信息汇总
数据集概述
BigPatent
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 2162904748字节
- 示例数量: 41479
- 下载大小: 431393280字节
BookSum
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 560690900字节
- 示例数量: 9409
- 下载大小: 132130269字节
BoolQ
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 17836256字节
- 示例数量: 9427
- 下载大小: 3550975字节
CosmosQA
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 53658116字节
- 示例数量: 25262
- 下载大小: 7853834字节
DROP
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 250955748字节
- 示例数量: 77204
- 下载大小: 14175687字节
HotpotQA
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 1102360272字节
- 示例数量: 90208
- 下载大小: 236367807字节
LongAlpaca
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 743978012字节
- 示例数量: 7627
- 下载大小: 149072591字节
MultiNews
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 1050726252字节
- 示例数量: 44351
- 下载大小: 249001883字节
MultiRC
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 43583068字节
- 示例数量: 12025
- 下载大小: 2381273字节
NarrativeQA
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 7766436704字节
- 示例数量: 13344
- 下载大小: 1653310065字节
QMsum
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 149627860字节
- 示例数量: 1257
- 下载大小: 22788582字节
Qasper
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 120323992字节
- 示例数量: 2545
- 下载大小: 23399636字节
Quality
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 141480664字节
- 示例数量: 2523
- 下载大小: 24146326字节
ReCoRD
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 294993520字节
- 示例数量: 100684
- 下载大小: 59618870字节
SQuAD
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 188547064字节
- 示例数量: 87576
- 下载大小: 31330364字节
TriviaQA
- 特征:
- input_ids: 序列类型为int32
- input_sentences_ids: 序列类型为int64
- labels: 序列类型为int64
- 训练集:
- 数据大小: 5052603772字节
- 示例数量: 53503
- 下载大小: 1048885926字节



