orendar/ultrafeedback_binarized_filtered

Name: orendar/ultrafeedback_binarized_filtered
Creator: orendar
Published: 2024-07-04 06:26:39
License: 暂无描述

Hugging Face2024-07-04 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/orendar/ultrafeedback_binarized_filtered

下载链接

链接失效反馈

官方服务：

资源简介：

数据集名为ultrafeedback_binarized_filtered，包含多个特征，如prompt、prompt_id、chosen、rejected、messages等，每个特征都有其特定的数据类型。数据集分为训练集和测试集，训练集包含27043个样本，测试集包含200个样本。数据集的下载大小为90942425字节，总大小为163210287.0字节。数据集标签包括dpo，表明可能与数据处理优化相关。

The dataset named ultrafeedback_binarized_filtered includes multiple features such as prompt, prompt_id, chosen, rejected, messages, etc., each with its specific data type. The dataset is divided into a training set and a test set, with the training set containing 27,043 samples and the test set containing 200 samples. The download size of the dataset is 90,942,425 bytes, and the total size is 163,210,287.0 bytes. The dataset tags include dpo, indicating it may be related to data processing optimization.

提供机构：

orendar

原始信息汇总

数据集概述

数据集信息

特征列表:
- prompt: 类型为字符串。
- prompt_id: 类型为字符串。
- chosen: 包含以下子特征:
  - content: 类型为字符串。
  - role: 类型为字符串。
- rejected: 包含以下子特征:
  - content: 类型为字符串。
  - role: 类型为字符串。
- messages: 包含以下子特征:
  - content: 类型为字符串。
  - role: 类型为字符串。
- score_chosen: 类型为浮点数。
- score_rejected: 类型为浮点数。
- score_diff: 类型为浮点数。
- __index_level_0__: 类型为整数。
数据分割:
- train:
  - 字节数: 162012105.5442132
  - 样本数: 27043
- test:
  - 字节数: 1198181.4557868077
  - 样本数: 200
数据大小:
- 下载大小: 90942425 字节
- 数据集大小: 163210287.0 字节

配置信息

配置名称: default
- 数据文件:
  - train: 路径为 data/train-*
  - test: 路径为 data/test-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集