instruction-pretrain/general-instruction-augmented-corpora

Name: instruction-pretrain/general-instruction-augmented-corpora
Creator: instruction-pretrain
Published: 2024-07-15 08:40:08
License: 暂无描述

Hugging Face2024-07-15 更新2024-06-25 收录

下载链接：

https://hf-mirror.com/datasets/instruction-pretrain/general-instruction-augmented-corpora

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含200M条指令-响应对，覆盖40多个任务类别，用于指令预训练框架。数据集通过一个高效的指令合成器生成，该合成器基于开源模型构建。此外，数据集还包括从100B到250B的预训练令牌，以及500M条合成的指令-响应对。数据集的目的是验证指令预训练的有效性，并提供了一系列预训练模型和数据集资源。

This dataset contains 200M instruction-response pairs covering over 40 task categories, used for the Instruction Pre-Training framework. The dataset is generated by an efficient instruction synthesizer built on open-source models. Additionally, the dataset includes pre-trained tokens from 100B to 250B, and 500M synthesized instruction-response pairs. The purpose of the dataset is to verify the effectiveness of Instruction Pre-Training and provides a series of pre-trained models and dataset resources.

提供机构：

instruction-pretrain

原始信息汇总

数据集许可证

许可证类型: Open Data Commons Attribution License (ODC-BY)

5,000+

优质数据集

54 个

任务类型

进入经典数据集