alvarobartt/ultrafeedback-instruction-dataset

Name: alvarobartt/ultrafeedback-instruction-dataset
Creator: alvarobartt
Published: 2023-10-31 14:51:34
License: 暂无描述

Hugging Face2023-10-31 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/alvarobartt/ultrafeedback-instruction-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: generations sequence: string - name: raw_generation_response sequence: string - name: rating sequence: int64 - name: rationale sequence: string - name: raw_labelling_response struct: - name: choices list: - name: finish_reason dtype: string - name: index dtype: int64 - name: message struct: - name: content dtype: string - name: role dtype: string - name: created dtype: int64 - name: id dtype: string - name: model dtype: string - name: object dtype: string - name: usage struct: - name: completion_tokens dtype: int64 - name: prompt_tokens dtype: int64 - name: total_tokens dtype: int64 splits: - name: train num_bytes: 167493 num_examples: 50 download_size: 98372 dataset_size: 167493 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "ultrafeedback-instruction-dataset" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

alvarobartt

原始信息汇总

数据集概述

数据集信息

特征（features）:
- instruction: 数据类型为字符串（string）。
- generations: 序列类型，数据类型为字符串（string）。
- raw_generation_response: 序列类型，数据类型为字符串（string）。
- rating: 序列类型，数据类型为整数（int64）。
- rationale: 序列类型，数据类型为字符串（string）。
- raw_labelling_response: 结构类型，包含以下字段：
  - choices: 列表类型，包含以下字段：
    - finish_reason: 数据类型为字符串（string）。
    - index: 数据类型为整数（int64）。
    - message: 结构类型，包含以下字段：
      - content: 数据类型为字符串（string）。
      - role: 数据类型为字符串（string）。
  - created: 数据类型为整数（int64）。
  - id: 数据类型为字符串（string）。
  - model: 数据类型为字符串（string）。
  - object: 数据类型为字符串（string）。
  - usage: 结构类型，包含以下字段：
    - completion_tokens: 数据类型为整数（int64）。
    - prompt_tokens: 数据类型为整数（int64）。
    - total_tokens: 数据类型为整数（int64）。
分割（splits）:
- train: 包含50个样本，占用167493字节。
下载大小: 98372字节。
数据集大小: 167493字节。

配置（configs）

default:
- 数据文件（data_files）:
  - train: 路径为data/train-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集