alvarobartt/ultrafeedback-binarized-preferences-clean
收藏Hugging Face2023-12-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/alvarobartt/ultrafeedback-binarized-preferences-clean
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: source
dtype: string
- name: instruction
dtype: string
- name: models
sequence: string
- name: completions
list:
- name: annotations
struct:
- name: instruction_following
struct:
- name: Rating
dtype: string
- name: Rationale
dtype: string
- name: honesty
struct:
- name: Rating
dtype: string
- name: Rationale
dtype: string
- name: truthfulness
struct:
- name: Type
sequence: string
- name: Rationale
dtype: string
- name: Rating
dtype: string
- name: Rationale For Rating
dtype: string
- name: helpfulness
struct:
- name: Type
sequence: string
- name: Rationale
dtype: string
- name: Rating
dtype: string
- name: Rationale For Rating
dtype: string
- name: custom_system_prompt
dtype: string
- name: model
dtype: string
- name: principle
dtype: string
- name: response
dtype: string
- name: critique
dtype: string
- name: overall_score
dtype: float64
- name: correct_answers
sequence: string
- name: incorrect_answers
sequence: string
splits:
- name: train
num_bytes: 831088377.5421702
num_examples: 63136
download_size: 318279041
dataset_size: 831088377.5421702
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
This dataset is primarily used for evaluating and analyzing the performance of natural language processing models on various tasks. It includes multiple features such as source, instruction, models, completions, and more. The completions are further subdivided into several sub-features like annotations, custom system prompt, model, principle, response, critique, and overall score. The dataset also contains sequences of correct and incorrect answers. The dataset is split into a training set, providing the number of bytes and examples.
提供机构:
alvarobartt
原始信息汇总
数据集概述
数据集特征
- source: 字符串类型
- instruction: 字符串类型
- models: 字符串序列
- completions: 列表类型,包含以下结构:
- annotations: 结构体,包含以下字段:
- instruction_following: 结构体,包含以下字段:
- Rating: 字符串类型
- Rationale: 字符串类型
- honesty: 结构体,包含以下字段:
- Rating: 字符串类型
- Rationale: 字符串类型
- truthfulness: 结构体,包含以下字段:
- Type: 字符串序列
- Rationale: 字符串类型
- Rating: 字符串类型
- Rationale For Rating: 字符串类型
- helpfulness: 结构体,包含以下字段:
- Type: 字符串序列
- Rationale: 字符串类型
- Rating: 字符串类型
- Rationale For Rating: 字符串类型
- instruction_following: 结构体,包含以下字段:
- custom_system_prompt: 字符串类型
- model: 字符串类型
- principle: 字符串类型
- response: 字符串类型
- critique: 字符串类型
- overall_score: 浮点数类型
- annotations: 结构体,包含以下字段:
- correct_answers: 字符串序列
- incorrect_answers: 字符串序列
数据集分割
- train: 包含63136个样本,占用831088377.5421702字节
数据集大小
- 下载大小: 318279041字节
- 数据集大小: 831088377.5421702字节
配置
- default: 包含训练数据文件,路径为
data/train-*



