Taywon/webgpt
收藏Hugging Face2024-05-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Taywon/webgpt
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: question
struct:
- name: dataset
dtype: string
- name: id
dtype: string
- name: full_text
dtype: string
- name: quotes_0
sequence:
- name: title
dtype: string
- name: extract
dtype: string
- name: answer_0
dtype: string
- name: tokens_0
struct:
- name: prefix
sequence: int32
- name: completion
sequence: int32
- name: score_0
dtype: float32
- name: quotes_1
sequence:
- name: title
dtype: string
- name: extract
dtype: string
- name: answer_1
dtype: string
- name: tokens_1
struct:
- name: prefix
sequence: int32
- name: completion
sequence: int32
- name: score_1
dtype: float32
- name: input_ids_chosen
sequence: int64
- name: attention_mask_chosen
sequence: int64
- name: input_ids_rejected
sequence: int64
- name: attention_mask_rejected
sequence: int64
splits:
- name: train
num_bytes: 239337866.30222642
num_examples: 12663
- name: eval
num_bytes: 12599727.0
num_examples: 667
download_size: 103209679
dataset_size: 251937593.30222642
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: eval
path: data/eval-*
---
# Dataset Card for "webgpt"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
Taywon
原始信息汇总
数据集概述
数据集特征
-
question
- dataset: 数据类型为字符串
- id: 数据类型为字符串
- full_text: 数据类型为字符串
-
quotes_0
- title: 数据类型为字符串
- extract: 数据类型为字符串
-
answer_0: 数据类型为字符串
-
tokens_0
- prefix: 数据类型为整数序列
- completion: 数据类型为整数序列
-
score_0: 数据类型为浮点数
-
quotes_1
- title: 数据类型为字符串
- extract: 数据类型为字符串
-
answer_1: 数据类型为字符串
-
tokens_1
- prefix: 数据类型为整数序列
- completion: 数据类型为整数序列
-
score_1: 数据类型为浮点数
-
input_ids_chosen: 数据类型为整数序列
-
attention_mask_chosen: 数据类型为整数序列
-
input_ids_rejected: 数据类型为整数序列
-
attention_mask_rejected: 数据类型为整数序列
数据集分割
-
train
- 数据大小: 239337866.30222642 字节
- 样本数量: 12663
-
eval
- 数据大小: 12599727.0 字节
- 样本数量: 667
数据集大小
- 下载大小: 103209679 字节
- 总大小: 251937593.30222642 字节
配置文件
- config_name: default
- train: 路径为 data/train-*
- eval: 路径为 data/eval-*



