allenai/ultrafeedback_binarized_cleaned
收藏Hugging Face2023-12-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/allenai/ultrafeedback_binarized_cleaned
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
configs:
- config_name: default
data_files:
- split: train_sft
path: data/train_sft-*
- split: test_sft
path: data/test_sft-*
- split: train_gen
path: data/train_gen-*
- split: test_gen
path: data/test_gen-*
- split: train_prefs
path: data/train_prefs-*
- split: test_prefs
path: data/test_prefs-*
dataset_info:
features:
- name: prompt
dtype: string
- name: prompt_id
dtype: string
- name: chosen
list:
- name: content
dtype: string
- name: role
dtype: string
- name: rejected
list:
- name: content
dtype: string
- name: role
dtype: string
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: score_chosen
dtype: float64
- name: score_rejected
dtype: float64
- name: source
dtype: string
splits:
- name: train_sft
num_bytes: 393926052.7984401
num_examples: 60829
- name: test_sft
num_bytes: 6230841.363636363
num_examples: 985
- name: train_gen
num_bytes: 314344767.49216783
num_examples: 60829
- name: test_gen
num_bytes: 4982506.090909091
num_examples: 985
- name: train_prefs
num_bytes: 393926052.7984401
num_examples: 60829
- name: test_prefs
num_bytes: 12672623.615773508
num_examples: 1964
download_size: 629736515
dataset_size: 1126082844.1593668
---
# Dataset Card for "ultrafeedback_binarized_cleaned"
**Update 1/12/2023**: I've removed examples identified as faulty by Argilla - see [their awesome work](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences) for more details.
This is a version of the [UltraFeedback binarized dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) but with TruthfulQA prompts removed and source annotations added (so you can filter out samples from different sources yourself if you want!).
Please see the [binarized dataset card for more information](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), or the [original UltraFeedback dataset card](https://huggingface.co/datasets/openbmb/UltraFeedback).
提供机构:
allenai
原始信息汇总
数据集概述
数据集名称
ultrafeedback_binarized_cleaned
许可证
MIT
配置
- default
- 数据文件路径:
train_sft:data/train_sft-*test_sft:data/test_sft-*train_gen:data/train_gen-*test_gen:data/test_gen-*train_prefs:data/train_prefs-*test_prefs:data/test_prefs-*
- 数据文件路径:
数据集信息
-
特征:
prompt:stringprompt_id:stringchosen:content:stringrole:string
rejected:content:stringrole:string
messages:content:stringrole:string
score_chosen:float64score_rejected:float64source:string
-
分割:
train_sft:num_bytes: 393926052.7984401num_examples: 60829
test_sft:num_bytes: 6230841.363636363num_examples: 985
train_gen:num_bytes: 314344767.49216783num_examples: 60829
test_gen:num_bytes: 4982506.090909091num_examples: 985
train_prefs:num_bytes: 393926052.7984401num_examples: 60829
test_prefs:num_bytes: 12672623.615773508num_examples: 1964
数据集大小
- 下载大小: 629736515
- 数据集大小: 1126082844.1593668



