allenai/preference-datasets-tulu
收藏Hugging Face2024-02-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/allenai/preference-datasets-tulu
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: chosen
list:
- name: content
dtype: string
- name: role
dtype: string
- name: rejected
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
splits:
- name: helpsteer
num_bytes: 55733744
num_examples: 9270
- name: nectar
num_bytes: 574311573
num_examples: 182902
- name: ultrafeedback_mean_aspects_cleaned
num_bytes: 240729118
num_examples: 60908
- name: ultrafeedback_overall_cleaned
num_bytes: 229339964
num_examples: 58933
- name: shp_2
num_bytes: 9848540246
num_examples: 4324531
- name: orca_dpo_pairs
num_bytes: 43942996
num_examples: 12859
- name: stack_exchange_paired
num_bytes: 19119604576
num_examples: 4999988
- name: anthropic_hh
num_bytes: 515149155
num_examples: 158348
download_size: 17082464423
dataset_size: 30627351372
configs:
- config_name: default
data_files:
- split: helpsteer
path: data/helpsteer-*
- split: nectar
path: data/nectar-*
- split: ultrafeedback_mean_aspects_cleaned
path: data/ultrafeedback_mean_aspects_cleaned-*
- split: ultrafeedback_overall_cleaned
path: data/ultrafeedback_overall_cleaned-*
- split: shp_2
path: data/shp_2-*
- split: orca_dpo_pairs
path: data/orca_dpo_pairs-*
- split: stack_exchange_paired
path: data/stack_exchange_paired-*
- split: anthropic_hh
path: data/anthropic_hh-*
---
数据集基本信息:
特征字段:
- 优选样本(chosen):列表类型,包含子字段:
- 内容(content):数据类型为字符串
- 角色(role):数据类型为字符串
- 拒选样本(rejected):列表类型,包含子字段:
- 内容(content):数据类型为字符串
- 角色(role):数据类型为字符串
- 来源(source):数据类型为字符串
数据集划分:
- 划分名称:helpsteer,字节数:55733744,样本数量:9270
- 划分名称:nectar,字节数:574311573,样本数量:182902
- 划分名称:ultrafeedback_mean_aspects_cleaned,字节数:240729118,样本数量:60908
- 划分名称:ultrafeedback_overall_cleaned,字节数:229339964,样本数量:58933
- 划分名称:shp_2,字节数:9848540246,样本数量:4324531
- 划分名称:orca_dpo_pairs,字节数:43942996,样本数量:12859
- 划分名称:stack_exchange_paired,字节数:19119604576,样本数量:4999988
- 划分名称:anthropic_hh,字节数:515149155,样本数量:158348
下载总大小:17082464423
数据集总大小:30627351372
配置项:
- 配置名称:default
数据文件:
- 划分:helpsteer,路径:data/helpsteer-*
- 划分:nectar,路径:data/nectar-*
- 划分:ultrafeedback_mean_aspects_cleaned,路径:data/ultrafeedback_mean_aspects_cleaned-*
- 划分:ultrafeedback_overall_cleaned,路径:data/ultrafeedback_overall_cleaned-*
- 划分:shp_2,路径:data/shp_2-*
- 划分:orca_dpo_pairs,路径:data/orca_dpo_pairs-*
- 划分:stack_exchange_paired,路径:data/stack_exchange_paired-*
- 划分:anthropic_hh,路径:data/anthropic_hh-*
提供机构:
allenai
原始信息汇总
数据集概述
数据集特征
- chosen
- content: 数据类型为字符串
- role: 数据类型为字符串
- rejected
- content: 数据类型为字符串
- role: 数据类型为字符串
- source: 数据类型为字符串
数据集分割
- helpsteer
- 字节数: 55733744
- 样本数: 9270
- nectar
- 字节数: 574311573
- 样本数: 182902
- ultrafeedback_mean_aspects_cleaned
- 字节数: 240729118
- 样本数: 60908
- ultrafeedback_overall_cleaned
- 字节数: 229339964
- 样本数: 58933
- shp_2
- 字节数: 9848540246
- 样本数: 4324531
- orca_dpo_pairs
- 字节数: 43942996
- 样本数: 12859
- stack_exchange_paired
- 字节数: 19119604576
- 样本数: 4999988
- anthropic_hh
- 字节数: 515149155
- 样本数: 158348
数据集大小
- 下载大小: 17082464423 字节
- 数据集大小: 30627351372 字节
配置
- default
- helpsteer: 路径为
data/helpsteer-* - nectar: 路径为
data/nectar-* - ultrafeedback_mean_aspects_cleaned: 路径为
data/ultrafeedback_mean_aspects_cleaned-* - ultrafeedback_overall_cleaned: 路径为
data/ultrafeedback_overall_cleaned-* - shp_2: 路径为
data/shp_2-* - orca_dpo_pairs: 路径为
data/orca_dpo_pairs-* - stack_exchange_paired: 路径为
data/stack_exchange_paired-* - anthropic_hh: 路径为
data/anthropic_hh-*
- helpsteer: 路径为



