GitBag/multiturn-512-hh-turn-1-5

Name: GitBag/multiturn-512-hh-turn-1-5
Creator: GitBag
Published: 2024-06-28 22:14:06
License: 暂无描述

Hugging Face2024-06-28 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/GitBag/multiturn-512-hh-turn-1-5

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含用于对话生成和模型训练的数据，主要特征包括chosen（选择的对话内容）、rejected（被拒绝的对话内容）、prompt（提示文本）以及多个回合的对话数据（如llama_dialogue、llama_prompt_turn_X、llama_response_turn_X等）。数据集分为训练集和测试集，训练集包含156,466个样本，测试集包含8,285个样本。数据集的下载大小为588,230,314字节，总大小为7,664,718,020字节。

This dataset contains data for dialogue generation and model training, with key features including chosen (selected dialogue content), rejected (rejected dialogue content), prompt (prompt text), and multiple turns of dialogue data (e.g., llama_dialogue, llama_prompt_turn_X, llama_response_turn_X, etc.). The dataset is divided into a training set with 156,466 samples and a test set with 8,285 samples. The download size of the dataset is 588,230,314 bytes, and the total size is 7,664,718,020 bytes.

提供机构：

GitBag

原始信息汇总

数据集概述

特征信息

chosen:
- content: 类型为字符串
- role: 类型为字符串
rejected:
- content: 类型为字符串
- role: 类型为字符串
prompt: 类型为字符串
llama_dialogue: 类型为字符串
llama_dialogue_tokens: 类型为整数序列
num_turn: 类型为整数
llama_prompt_turn_0: 类型为字符串
llama_prompt_token_turn_0: 类型为整数序列
llama_response_turn_0: 类型为字符串
llama_response_token_turn_0: 类型为整数序列
llama_prompt_turn_1: 类型为字符串
llama_prompt_token_turn_1: 类型为整数序列
llama_response_turn_1: 类型为字符串
llama_response_token_turn_1: 类型为整数序列
llama_prompt_turn_2: 类型为字符串
llama_prompt_token_turn_2: 类型为整数序列
llama_response_turn_2: 类型为字符串
llama_response_token_turn_2: 类型为整数序列
llama_prompt_turn_3: 类型为字符串
llama_prompt_token_turn_3: 类型为整数序列
llama_response_turn_3: 类型为字符串
llama_response_token_turn_3: 类型为整数序列
llama_prompt_turn_4: 类型为字符串
llama_prompt_token_turn_4: 类型为整数序列
llama_response_turn_4: 类型为字符串
llama_response_token_turn_4: 类型为整数序列

数据集划分

train:
- 字节数: 7279320881
- 样本数: 156466
test:
- 字节数: 385397139
- 样本数: 8285

数据集大小

下载大小: 588230314 字节
数据集总大小: 7664718020 字节

配置信息

config_name: default
- data_files:
  - train: 路径为 data/train-*
  - test: 路径为 data/test-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集