five

cleanrl/summarize_from_feedback_oai_preprocessing_1704321749

收藏
Hugging Face2024-01-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cleanrl/summarize_from_feedback_oai_preprocessing_1704321749
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: info struct: - name: id dtype: string - name: post dtype: string - name: title dtype: string - name: subreddit dtype: string - name: site dtype: string - name: article dtype: string - name: summaries list: - name: text dtype: string - name: policy dtype: string - name: note dtype: string - name: choice dtype: int32 - name: worker dtype: string - name: batch dtype: string - name: split dtype: string - name: extra struct: - name: confidence dtype: int32 - name: query_token sequence: int64 - name: query dtype: string - name: response0 dtype: string - name: response0_token sequence: int64 - name: response0_token_len dtype: int64 - name: response1 dtype: string - name: response1_token sequence: int64 - name: response1_token_len dtype: int64 - name: response0_policy dtype: string - name: response1_policy dtype: string - name: policies dtype: string - name: query_response0 dtype: string - name: query_response0_token sequence: int64 - name: query_response0_token_len dtype: int64 - name: query_response1 dtype: string - name: query_response1_token sequence: int64 - name: query_response1_token_len dtype: int64 splits: - name: train num_bytes: 2210564467 num_examples: 92858 - name: validation num_bytes: 2054238499 num_examples: 86086 download_size: 271337037 dataset_size: 4264802966 --- # Dataset Card for "summarize_from_feedback_oai_preprocessing_1704321749" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

The dataset includes multiple features such as info, summaries, choice, etc., each with detailed structure and data type descriptions. The info feature contains fields like id, post, title, while the summaries feature includes text, policy, note, etc. The dataset is split into train and validation with 92858 and 86086 examples respectively. The size and download size of the dataset are also specified.
提供机构:
cleanrl
原始信息汇总

数据集概述

数据集信息

特征结构

  • info
    • id: 字符串类型
    • post: 字符串类型
    • title: 字符串类型
    • subreddit: 字符串类型
    • site: 字符串类型
    • article: 字符串类型
  • summaries
    • text: 字符串类型
    • policy: 字符串类型
    • note: 字符串类型
  • choice: 32位整数类型
  • worker: 字符串类型
  • batch: 字符串类型
  • split: 字符串类型
  • extra
    • confidence: 32位整数类型
  • query_token: 64位整数序列
  • query: 字符串类型
  • response0: 字符串类型
  • response0_token: 64位整数序列
  • response0_token_len: 64位整数类型
  • response1: 字符串类型
  • response1_token: 64位整数序列
  • response1_token_len: 64位整数类型
  • response0_policy: 字符串类型
  • response1_policy: 字符串类型
  • policies: 字符串类型
  • query_response0: 字符串类型
  • query_response0_token: 64位整数序列
  • query_response0_token_len: 64位整数类型
  • query_response1: 字符串类型
  • query_response1_token: 64位整数序列
  • query_response1_token_len: 64位整数类型

数据分割

  • train
    • 字节数: 2210564467
    • 样本数: 92858
  • validation
    • 字节数: 2054238499
    • 样本数: 86086

数据集大小

  • 下载大小: 271337037 字节
  • 数据集大小: 4264802966 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作