amrachraf/arXiv-full-text-chunked-qa
收藏Hugging Face2024-05-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/amrachraf/arXiv-full-text-chunked-qa
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: chunk_0
features:
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 84846680
num_examples: 46064
download_size: 44439561
dataset_size: 84846680
- config_name: chunk_1
features:
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 30273925
num_examples: 17138
download_size: 15849531
dataset_size: 30273925
- config_name: chunk_2
features:
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 27147947
num_examples: 14860
download_size: 13565048
dataset_size: 27147947
- config_name: chunk_3
features:
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 136583480
num_examples: 74966
download_size: 70641272
dataset_size: 136583480
- config_name: chunk_4
features:
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 27047270
num_examples: 15190
download_size: 14188926
dataset_size: 27047270
configs:
- config_name: chunk_0
data_files:
- split: train
path: chunk_0/train-*
- config_name: chunk_1
data_files:
- split: train
path: chunk_1/train-*
- config_name: chunk_2
data_files:
- split: train
path: chunk_2/train-*
- config_name: chunk_3
data_files:
- split: train
path: chunk_3/train-*
- config_name: chunk_4
data_files:
- split: train
path: chunk_4/train-*
---
The dataset consists of five configurations (chunk_0 to chunk_4), each containing a training set (train split). Each configuration features input and output, both of which are of string data type. Each configurations training set provides the number of bytes and examples, as well as the download size and dataset size. The data file paths are named according to the configuration name and split name.
提供机构:
amrachraf
原始信息汇总
数据集概述
数据集配置
chunk_0
- 特征:
input: 数据类型为stringoutput: 数据类型为string
- 分割:
train: 字节数为 84846680,样本数为 46064
- 下载大小: 44439561 字节
- 数据集大小: 84846680 字节
- 数据文件路径:
chunk_0/train-*
chunk_1
- 特征:
input: 数据类型为stringoutput: 数据类型为string
- 分割:
train: 字节数为 30273925,样本数为 17138
- 下载大小: 15849531 字节
- 数据集大小: 30273925 字节
- 数据文件路径:
chunk_1/train-*
chunk_2
- 特征:
input: 数据类型为stringoutput: 数据类型为string
- 分割:
train: 字节数为 27147947,样本数为 14860
- 下载大小: 13565048 字节
- 数据集大小: 27147947 字节
- 数据文件路径:
chunk_2/train-*
chunk_3
- 特征:
input: 数据类型为stringoutput: 数据类型为string
- 分割:
train: 字节数为 136583480,样本数为 74966
- 下载大小: 70641272 字节
- 数据集大小: 136583480 字节
- 数据文件路径:
chunk_3/train-*
chunk_4
- 特征:
input: 数据类型为stringoutput: 数据类型为string
- 分割:
train: 字节数为 27047270,样本数为 15190
- 下载大小: 14188926 字节
- 数据集大小: 27047270 字节
- 数据文件路径:
chunk_4/train-*



