tomg-group-umd/GenQA_raw
收藏Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/tomg-group-umd/GenQA_raw
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: academic
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: prompt
dtype: string
- name: template
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 8614955916
num_examples: 4210076
download_size: 4070747258
dataset_size: 8614955916
- config_name: code
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: prompt
dtype: string
- name: template
dtype: string
- name: category
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 855686195
num_examples: 513483
download_size: 370326167
dataset_size: 855686195
- config_name: dialog
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: user2
dtype: string
- name: assistant2
dtype: string
- name: user3
dtype: string
- name: assistant3
dtype: string
- name: user4
dtype: string
- name: assistant4
dtype: string
- name: prompt
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 2613783708
num_examples: 819154
download_size: 1226407538
dataset_size: 2613783708
- config_name: general
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: template
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 377010471
num_examples: 304920
download_size: 211886096
dataset_size: 377010471
- config_name: math
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: user2
dtype: string
- name: assistant2
dtype: string
- name: prompt
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 912151884
num_examples: 515509
download_size: 271708327
dataset_size: 912151884
- config_name: mmlu
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: prompt
dtype: string
- name: template
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 4523835106
num_examples: 2409841
download_size: 2104540276
dataset_size: 4523835106
- config_name: multiple_choice
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: prompt
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 555013194
num_examples: 372610
download_size: 215020093
dataset_size: 555013194
- config_name: task
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: user2
dtype: string
- name: assistant2
dtype: string
- name: prompt
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 2160397568
num_examples: 1004179
download_size: 881027426
dataset_size: 2160397568
- config_name: writing
features:
- name: user
dtype: string
- name: assistant
dtype: string
- name: user2
dtype: string
- name: assistant2
dtype: string
- name: prompt
dtype: string
- name: template
dtype: string
- name: idx
dtype: int64
splits:
- name: train
num_bytes: 2947982996
num_examples: 932362
download_size: 1346605382
dataset_size: 2947982996
configs:
- config_name: academic
data_files:
- split: train
path: academic/train-*
- config_name: code
data_files:
- split: train
path: code/train-*
- config_name: dialog
data_files:
- split: train
path: dialog/train-*
- config_name: general
data_files:
- split: train
path: general/train-*
- config_name: math
data_files:
- split: train
path: math/train-*
- config_name: mmlu
data_files:
- split: train
path: mmlu/train-*
- config_name: multiple_choice
data_files:
- split: train
path: multiple_choice/train-*
- config_name: task
data_files:
- split: train
path: task/train-*
- config_name: writing
data_files:
- split: train
path: writing/train-*
---
提供机构:
tomg-group-umd
原始信息汇总
数据集概述
数据集配置
学术 (academic)
- 特征:
- user: string
- assistant: string
- prompt: string
- template: string
- idx: int64
- 分割:
- train:
- 字节数: 8614955916
- 样本数: 4210076
- train:
- 下载大小: 4070747258
- 数据集大小: 8614955916
- 数据文件路径: academic/train-*
代码 (code)
- 特征:
- user: string
- assistant: string
- prompt: string
- template: string
- category: string
- idx: int64
- 分割:
- train:
- 字节数: 855686195
- 样本数: 513483
- train:
- 下载大小: 370326167
- 数据集大小: 855686195
- 数据文件路径: code/train-*
对话 (dialog)
- 特征:
- user: string
- assistant: string
- user2: string
- assistant2: string
- user3: string
- assistant3: string
- user4: string
- assistant4: string
- prompt: string
- idx: int64
- 分割:
- train:
- 字节数: 2613783708
- 样本数: 819154
- train:
- 下载大小: 1226407538
- 数据集大小: 2613783708
- 数据文件路径: dialog/train-*
通用 (general)
- 特征:
- user: string
- assistant: string
- template: string
- idx: int64
- 分割:
- train:
- 字节数: 377010471
- 样本数: 304920
- train:
- 下载大小: 211886096
- 数据集大小: 377010471
- 数据文件路径: general/train-*
数学 (math)
- 特征:
- user: string
- assistant: string
- user2: string
- assistant2: string
- prompt: string
- idx: int64
- 分割:
- train:
- 字节数: 912151884
- 样本数: 515509
- train:
- 下载大小: 271708327
- 数据集大小: 912151884
- 数据文件路径: math/train-*
MMLU (mmlu)
- 特征:
- user: string
- assistant: string
- prompt: string
- template: string
- idx: int64
- 分割:
- train:
- 字节数: 4523835106
- 样本数: 2409841
- train:
- 下载大小: 2104540276
- 数据集大小: 4523835106
- 数据文件路径: mmlu/train-*
多选题 (multiple_choice)
- 特征:
- user: string
- assistant: string
- prompt: string
- idx: int64
- 分割:
- train:
- 字节数: 555013194
- 样本数: 372610
- train:
- 下载大小: 215020093
- 数据集大小: 555013194
- 数据文件路径: multiple_choice/train-*
任务 (task)
- 特征:
- user: string
- assistant: string
- user2: string
- assistant2: string
- prompt: string
- idx: int64
- 分割:
- train:
- 字节数: 2160397568
- 样本数: 1004179
- train:
- 下载大小: 881027426
- 数据集大小: 2160397568
- 数据文件路径: task/train-*
写作 (writing)
- 特征:
- user: string
- assistant: string
- user2: string
- assistant2: string
- prompt: string
- template: string
- idx: int64
- 分割:
- train:
- 字节数: 2947982996
- 样本数: 932362
- train:
- 下载大小: 1346605382
- 数据集大小: 2947982996
- 数据文件路径: writing/train-*



