vwxyzjn/summarize_from_feedback_oai_preprocessing_1704169778

Name: vwxyzjn/summarize_from_feedback_oai_preprocessing_1704169778
Creator: vwxyzjn
Published: 2024-01-02 04:31:00
License: 暂无描述

Hugging Face2024-01-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/vwxyzjn/summarize_from_feedback_oai_preprocessing_1704169778

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: info struct: - name: id dtype: string - name: post dtype: string - name: title dtype: string - name: subreddit dtype: string - name: site dtype: string - name: article dtype: string - name: summaries list: - name: text dtype: string - name: policy dtype: string - name: note dtype: string - name: choice dtype: int32 - name: worker dtype: string - name: batch dtype: string - name: split dtype: string - name: extra struct: - name: confidence dtype: int32 - name: query_token sequence: int64 - name: query dtype: string - name: response0 dtype: string - name: response0_token sequence: int64 - name: response0_token_len dtype: int64 - name: response1 dtype: string - name: response1_token sequence: int64 - name: response1_token_len dtype: int64 - name: response0_policy dtype: string - name: response1_policy dtype: string - name: policies dtype: string - name: query_response0 dtype: string - name: query_response0_token sequence: int64 - name: query_response0_token_len dtype: int64 - name: query_response1 dtype: string - name: query_response1_token sequence: int64 - name: query_response1_token_len dtype: int64 splits: - name: train num_bytes: 1914904595 num_examples: 92858 - name: validation num_bytes: 1780140675 num_examples: 86086 download_size: 270613778 dataset_size: 3695045270 --- # Dataset Card for "summarize_from_feedback_oai_preprocessing_1704169778" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息：特征字段： - 字段名：info，属于结构化类型，包含以下子字段： - 子字段名：id，数据类型：string - 子字段名：post，数据类型：string - 子字段名：title，数据类型：string - 子字段名：Reddit子版块（subreddit），数据类型：string - 子字段名：site，数据类型：string - 子字段名：article，数据类型：string - 字段名：summaries，为列表类型，列表元素采用结构化类型，包含： - 子字段名：text，数据类型：string - 子字段名：policy，数据类型：string - 子字段名：note，数据类型：string - 字段名：choice（选择项），数据类型：int32 - 字段名：worker（标注人员），数据类型：string - 字段名：batch（批次），数据类型：string - 字段名：split（划分集），数据类型：string - 字段名：extra，属于结构化类型，包含子字段： - 子字段名：confidence，数据类型：int32 - 字段名：query_token，为int64类型的Token序列 - 字段名：query，数据类型：string - 字段名：response0，数据类型：string - 字段名：response0_token，为int64类型的Token序列 - 字段名：response0_token_len，数据类型：int64 - 字段名：response1，数据类型：string - 字段名：response1_token，为int64类型的Token序列 - 字段名：response1_token_len，数据类型：int64 - 字段名：response0_policy，数据类型：string - 字段名：response1_policy，数据类型：string - 字段名：policies，数据类型：string - 字段名：query_response0，数据类型：string - 字段名：query_response0_token，为int64类型的Token序列 - 字段名：query_response0_token_len，数据类型：int64 - 字段名：query_response1，数据类型：string - 字段名：query_response1_token，为int64类型的Token序列 - 字段名：query_response1_token_len，数据类型：int64 数据集划分： - 划分集名称：训练集（train），数据字节数：1914904595，样本数量：92858 - 划分集名称：验证集（validation），数据字节数：1780140675，样本数量：86086 下载大小：270613778 字节数据集总大小：3695045270 字节 # 名为"summarize_from_feedback_oai_preprocessing_1704169778"的数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

vwxyzjn

原始信息汇总

数据集概述

数据集信息

特征结构

info: 包含以下字段
- id: 数据类型为字符串
- post: 数据类型为字符串
- title: 数据类型为字符串
- subreddit: 数据类型为字符串
- site: 数据类型为字符串
- article: 数据类型为字符串
summaries: 包含以下字段
- text: 数据类型为字符串
- policy: 数据类型为字符串
- note: 数据类型为字符串
choice: 数据类型为整数32位
worker: 数据类型为字符串
batch: 数据类型为字符串
split: 数据类型为字符串
extra: 包含以下字段
- confidence: 数据类型为整数32位
query_token: 数据类型为整数64位序列
query: 数据类型为字符串
response0: 数据类型为字符串
response0_token: 数据类型为整数64位序列
response0_token_len: 数据类型为整数64位
response1: 数据类型为字符串
response1_token: 数据类型为整数64位序列
response1_token_len: 数据类型为整数64位
response0_policy: 数据类型为字符串
response1_policy: 数据类型为字符串
policies: 数据类型为字符串
query_response0: 数据类型为字符串
query_response0_token: 数据类型为整数64位序列
query_response0_token_len: 数据类型为整数64位
query_response1: 数据类型为字符串
query_response1_token: 数据类型为整数64位序列
query_response1_token_len: 数据类型为整数64位

数据集分割

train: 包含92858个样本，大小为1914904595字节
validation: 包含86086个样本，大小为1780140675字节

数据集大小

下载大小: 270613778字节
数据集总大小: 3695045270字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集