cornfieldrm/pair-preference-dataset-700K_subset-4-of-4_gemma-2b_1of4_iter3_conf-0.9_bs128_lr1e-5_conf-0.9
收藏Hugging Face2024-05-31 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/cornfieldrm/pair-preference-dataset-700K_subset-4-of-4_gemma-2b_1of4_iter3_conf-0.9_bs128_lr1e-5_conf-0.9
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征,如被拒绝的内容及其分数、被选择的内容及其分数、长度、选择概率和消息内容。数据集主要包含字符串、浮点数和整数类型的数据。数据集包含一个训练集,大小为304047113.86623025字节,包含44838个示例。下载大小为199805224字节。
This dataset is primarily used for dialogue analysis, including rejected and chosen dialogue content and their scores, as well as the length and probability of being chosen for each dialogue. Each dialogue content contains text and role information. The dataset is divided into a training set, providing data volume and example count.
提供机构:
cornfieldrm
原始信息汇总
数据集概述
数据集特征
- rejected
- content: 数据类型为字符串
- role: 数据类型为字符串
- rejected_score: 数据类型为浮点数
- chosen_score: 数据类型为浮点数
- chosen
- content: 数据类型为字符串
- role: 数据类型为字符串
- length: 数据类型为整数
- chosen_prob: 数据类型为浮点数
- messages
- content: 数据类型为字符串
- role: 数据类型为字符串
数据集分割
- train
- num_bytes: 304047113.86623025 字节
- num_examples: 44838 个样本
数据集大小
- download_size: 199805224 字节
- dataset_size: 304047113.86623025 字节



