xDAN-datasets/ChatQA-Training-Data
收藏Hugging Face2024-05-12 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/xDAN-datasets/ChatQA-Training-Data
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: drop
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 78867689
num_examples: 29195
download_size: 9598684
dataset_size: 78867689
- config_name: narrativeqa
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence:
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 284098258
num_examples: 40000
download_size: 10699133
dataset_size: 284098258
- config_name: newsqa
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 573133568
num_examples: 76560
download_size: 71189729
dataset_size: 573133568
- config_name: quoref
features:
- name: answers
sequence: string
- name: document
dtype: string
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 43068666
num_examples: 10996
download_size: 5976692
dataset_size: 43068666
- config_name: ropes
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 19415418
num_examples: 10924
download_size: 1350788
dataset_size: 19415418
- config_name: sft
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 666290328
num_examples: 128001
download_size: 385193089
dataset_size: 666290328
- config_name: squad1.1
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
list:
- name: answer_start
dtype: int64
- name: text
dtype: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 231108780
num_examples: 86863
download_size: 25656737
dataset_size: 231108780
- config_name: squad2.0
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
list:
- name: answer_start
dtype: int64
- name: text
dtype: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 232682362
num_examples: 129486
download_size: 28237410
dataset_size: 232682362
- config_name: synthetic_convqa
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 523263735
num_examples: 38689
download_size: 285865309
dataset_size: 523263735
- config_name: tatqa
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: document
dtype: string
- name: answers
sequence: string
- name: shargpt_formatted
list:
- name: from
dtype: string
- name: value
dtype: string
splits:
- name: train
num_bytes: 52679111
num_examples: 11501
download_size: 8550834
dataset_size: 52679111
configs:
- config_name: drop
data_files:
- split: train
path: drop/train-*
- config_name: narrativeqa
data_files:
- split: train
path: narrativeqa/train-*
- config_name: newsqa
data_files:
- split: train
path: newsqa/train-*
- config_name: quoref
data_files:
- split: train
path: quoref/train-*
- config_name: ropes
data_files:
- split: train
path: ropes/train-*
- config_name: sft
data_files:
- split: train
path: sft/train-*
- config_name: squad1.1
data_files:
- split: train
path: squad1.1/train-*
- config_name: squad2.0
data_files:
- split: train
path: squad2.0/train-*
- config_name: synthetic_convqa
data_files:
- split: train
path: synthetic_convqa/train-*
- config_name: tatqa
data_files:
- split: train
path: tatqa/train-*
---
数据集信息:
- 配置名称:drop
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:78867689,样本数量:29195
下载大小:9598684,数据集总大小:78867689
- 配置名称:narrativeqa
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列的序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:284098258,样本数量:40000
下载大小:10699133,数据集总大小:284098258
- 配置名称:newsqa
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:573133568,样本数量:76560
下载大小:71189729,数据集总大小:573133568
- 配置名称:quoref
特征字段:
- 字段名:answers,数据类型:字符串序列
- 字段名:document,数据类型:字符串
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:43068666,样本数量:10996
下载大小:5976692,数据集总大小:43068666
- 配置名称:ropes
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:19415418,样本数量:10924
下载大小:1350788,数据集总大小:19415418
- 配置名称:sft
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:666290328,样本数量:128001
下载大小:385193089,数据集总大小:666290328
- 配置名称:squad1.1
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,类型为列表,包含两个子字段:
- 字段名:answer_start,数据类型:int64
- 字段名:text,数据类型:字符串
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:231108780,样本数量:86863
下载大小:25656737,数据集总大小:231108780
- 配置名称:squad2.0
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,类型为列表,包含两个子字段:
- 字段名:answer_start,数据类型:int64
- 字段名:text,数据类型:字符串
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:232682362,样本数量:129486
下载大小:28237410,数据集总大小:232682362
- 配置名称:synthetic_convqa
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:523263735,样本数量:38689
下载大小:285865309,数据集总大小:523263735
- 配置名称:tatqa
特征字段:
- 字段名:messages,类型为列表,包含两个子字段:
- 字段名:content,数据类型:字符串
- 字段名:role,数据类型:字符串
- 字段名:document,数据类型:字符串
- 字段名:answers,数据类型:字符串序列
- 字段名:shargpt_formatted,类型为列表,包含两个子字段:
- 字段名:from,数据类型:字符串
- 字段名:value,数据类型:字符串
数据集划分:
- 划分名称:train,字节数:52679111,样本数量:11501
下载大小:8550834,数据集总大小:52679111
配置项:
- 配置名称:drop,数据文件:
- 划分:train,路径:drop/train-*
- 配置名称:narrativeqa,数据文件:
- 划分:train,路径:narrativeqa/train-*
- 配置名称:newsqa,数据文件:
- 划分:train,路径:newsqa/train-*
- 配置名称:quoref,数据文件:
- 划分:train,路径:quoref/train-*
- 配置名称:ropes,数据文件:
- 划分:train,路径:ropes/train-*
- 配置名称:sft,数据文件:
- 划分:train,路径:sft/train-*
- 配置名称:squad1.1,数据文件:
- 划分:train,路径:squad1.1/train-*
- 配置名称:squad2.0,数据文件:
- 划分:train,路径:squad2.0/train-*
- 配置名称:synthetic_convqa,数据文件:
- 划分:train,路径:synthetic_convqa/train-*
- 配置名称:tatqa,数据文件:
- 划分:train,路径:tatqa/train-*
提供机构:
xDAN-datasets
原始信息汇总
数据集概述
1. drop
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 78867689
- 示例数: 29195
- train:
- 下载大小: 9598684
- 数据集大小: 78867689
2. narrativeqa
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 284098258
- 示例数: 40000
- train:
- 下载大小: 10699133
- 数据集大小: 284098258
3. newsqa
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 573133568
- 示例数: 76560
- train:
- 下载大小: 71189729
- 数据集大小: 573133568
4. quoref
- 特征:
- answers: sequence of string
- document: string
- messages:
- content: string
- role: string
- shargpt_formatted:
- from: string
- value: string
- 分割:
- train:
- 字节数: 43068666
- 示例数: 10996
- train:
- 下载大小: 5976692
- 数据集大小: 43068666
5. ropes
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 19415418
- 示例数: 10924
- train:
- 下载大小: 1350788
- 数据集大小: 19415418
6. sft
- 特征:
- messages:
- content: string
- role: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 666290328
- 示例数: 128001
- train:
- 下载大小: 385193089
- 数据集大小: 666290328
7. squad1.1
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers:
- answer_start: int64
- text: string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 231108780
- 示例数: 86863
- train:
- 下载大小: 25656737
- 数据集大小: 231108780
8. squad2.0
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers:
- answer_start: int64
- text: string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 232682362
- 示例数: 129486
- train:
- 下载大小: 28237410
- 数据集大小: 232682362
9. synthetic_convqa
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 523263735
- 示例数: 38689
- train:
- 下载大小: 285865309
- 数据集大小: 523263735
10. tatqa
- 特征:
- messages:
- content: string
- role: string
- document: string
- answers: sequence of string
- shargpt_formatted:
- from: string
- value: string
- messages:
- 分割:
- train:
- 字节数: 52679111
- 示例数: 11501
- train:
- 下载大小: 8550834
- 数据集大小: 52679111



