MLP-SEMO/CDT_qa_datasets
收藏Hugging Face2024-07-16 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/MLP-SEMO/CDT_qa_datasets
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个不同的配置:fineweb、fineweb_preprocessed和fineweb_qa。每个配置都包含context和question两个特征,fineweb_qa配置还包含response特征。数据集仅包含训练集,每个配置的训练集都有不同的字节数和示例数。fineweb配置的训练集包含9,581,376个示例,fineweb_preprocessed配置的训练集包含9,259,128个示例,fineweb_qa配置的训练集包含9,026,269个示例。
The dataset includes three different configurations: fineweb, fineweb_preprocessed, and fineweb_qa. Each configuration contains features such as context and question, with the fineweb_qa configuration additionally including the response feature. The dataset only contains a training set, with each configurations training set having different byte sizes and numbers of examples. The fineweb configurations training set contains 9,581,376 examples, the fineweb_preprocessed configurations training set contains 9,259,128 examples, and the fineweb_qa configurations training set contains 9,026,269 examples.
提供机构:
MLP-SEMO
原始信息汇总
数据集概述
数据集配置
1. fineweb
- 特征:
context: 字符串类型question: 字符串类型
- 分割:
train:- 字节数: 29,535,642,491
- 样本数: 9,581,376
- 下载大小: 18,140,288,984 字节
- 数据集大小: 29,535,642,491 字节
- 数据文件路径:
fineweb/train-*
2. fineweb_preprocessed
- 特征:
context: 字符串类型question: 字符串类型
- 分割:
train:- 字节数: 27,445,160,941.262375
- 样本数: 9,259,128
- 下载大小: 16,630,421,213 字节
- 数据集大小: 27,445,160,941.262375 字节
- 数据文件路径:
fineweb_preprocessed/train-*
3. fineweb_qa
- 特征:
context: 字符串类型question: 字符串类型response: 字符串类型
- 分割:
train:- 字节数: 29,377,996,197
- 样本数: 9,026,269
- 下载大小: 18,143,018,611 字节
- 数据集大小: 29,377,996,197 字节
- 数据文件路径:
fineweb_qa/train-*



