distilabel-internal-testing/fine-preferences-magpie-v6-tasky-4

Name: distilabel-internal-testing/fine-preferences-magpie-v6-tasky-4
Creator: distilabel-internal-testing
Published: 2024-07-17 14:49:14
License: 暂无描述

Hugging Face2024-07-17 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/distilabel-internal-testing/fine-preferences-magpie-v6-tasky-4

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是通过distilabel工具生成的，主要用于存储对话数据。数据集中包含多个特征，如文本、ID、URL、文件路径、语言、语言评分、token计数、评分、整数评分、系统提示、对话内容、生成对话模型名称、生成内容、distilabel元数据等。数据集的结构包括一个训练集，包含100个样本，总大小为2578965字节。数据集的标签包括synthetic、distilabel和rlaif。README文件中还提供了如何使用distilabel CLI工具来重现生成该数据集的pipeline的说明。

This dataset contains a `pipeline.yaml` file that can be used to reproduce the pipeline that generated it using the `distilabel` CLI. The dataset includes various features such as text, id, dump, url, file_path, language, language_score, token_count, score, int_score, system_prompt, conversation, gen_conv_model_name, generations, distilabel_metadata, and generations_model_names. The dataset is split into a training set with 100 examples and is tagged with synthetic, distilabel, and rlaif. The dataset was created using the distilabel tool.

提供机构：

distilabel-internal-testing

原始信息汇总

数据集概述

数据集结构

特征

text: 文本内容，类型为字符串。
id: 唯一标识符，类型为字符串。
dump: 数据转储信息，类型为字符串。
url: 数据来源URL，类型为字符串。
file_path: 文件路径，类型为字符串。
language: 语言标识，类型为字符串。
language_score: 语言得分，类型为浮点数。
token_count: 词元计数，类型为整数。
score: 得分，类型为浮点数。
int_score: 整数得分，类型为整数。
system_prompt: 系统提示，类型为字符串。
conversation: 对话列表，包含以下子特征：
- content: 对话内容，类型为字符串。
- role: 角色，类型为字符串。
gen_conv_model_name: 生成对话的模型名称，类型为字符串。
generations: 生成序列，类型为字符串序列。
distilabel_metadata: 元数据结构，包含以下子特征：
- raw_output_chat_generation_2: 原始输出，类型为字符串。
generations_model_names: 生成模型名称序列，类型为字符串序列。

数据分割

train: 训练集，包含100个样本，总字节数为2578965。

数据集大小

下载大小: 1148436字节
数据集大小: 2578965字节

配置

default: 默认配置，包含训练集数据文件路径为data/train-*。