cleanrl/summarize_from_feedback_oai_preprocessing_1704321749
收藏Hugging Face2024-01-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cleanrl/summarize_from_feedback_oai_preprocessing_1704321749
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: info
struct:
- name: id
dtype: string
- name: post
dtype: string
- name: title
dtype: string
- name: subreddit
dtype: string
- name: site
dtype: string
- name: article
dtype: string
- name: summaries
list:
- name: text
dtype: string
- name: policy
dtype: string
- name: note
dtype: string
- name: choice
dtype: int32
- name: worker
dtype: string
- name: batch
dtype: string
- name: split
dtype: string
- name: extra
struct:
- name: confidence
dtype: int32
- name: query_token
sequence: int64
- name: query
dtype: string
- name: response0
dtype: string
- name: response0_token
sequence: int64
- name: response0_token_len
dtype: int64
- name: response1
dtype: string
- name: response1_token
sequence: int64
- name: response1_token_len
dtype: int64
- name: response0_policy
dtype: string
- name: response1_policy
dtype: string
- name: policies
dtype: string
- name: query_response0
dtype: string
- name: query_response0_token
sequence: int64
- name: query_response0_token_len
dtype: int64
- name: query_response1
dtype: string
- name: query_response1_token
sequence: int64
- name: query_response1_token_len
dtype: int64
splits:
- name: train
num_bytes: 2210564467
num_examples: 92858
- name: validation
num_bytes: 2054238499
num_examples: 86086
download_size: 271337037
dataset_size: 4264802966
---
# Dataset Card for "summarize_from_feedback_oai_preprocessing_1704321749"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The dataset includes multiple features such as info, summaries, choice, etc., each with detailed structure and data type descriptions. The info feature contains fields like id, post, title, while the summaries feature includes text, policy, note, etc. The dataset is split into train and validation with 92858 and 86086 examples respectively. The size and download size of the dataset are also specified.
提供机构:
cleanrl
原始信息汇总
数据集概述
数据集信息
特征结构
- info
- id: 字符串类型
- post: 字符串类型
- title: 字符串类型
- subreddit: 字符串类型
- site: 字符串类型
- article: 字符串类型
- summaries
- text: 字符串类型
- policy: 字符串类型
- note: 字符串类型
- choice: 32位整数类型
- worker: 字符串类型
- batch: 字符串类型
- split: 字符串类型
- extra
- confidence: 32位整数类型
- query_token: 64位整数序列
- query: 字符串类型
- response0: 字符串类型
- response0_token: 64位整数序列
- response0_token_len: 64位整数类型
- response1: 字符串类型
- response1_token: 64位整数序列
- response1_token_len: 64位整数类型
- response0_policy: 字符串类型
- response1_policy: 字符串类型
- policies: 字符串类型
- query_response0: 字符串类型
- query_response0_token: 64位整数序列
- query_response0_token_len: 64位整数类型
- query_response1: 字符串类型
- query_response1_token: 64位整数序列
- query_response1_token_len: 64位整数类型
数据分割
- train
- 字节数: 2210564467
- 样本数: 92858
- validation
- 字节数: 2054238499
- 样本数: 86086
数据集大小
- 下载大小: 271337037 字节
- 数据集大小: 4264802966 字节



