brahmairesearch/G15

Name: brahmairesearch/G15
Creator: brahmairesearch
Published: 2024-09-05 13:10:44
License: 暂无描述

Hugging Face2024-09-05 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/brahmairesearch/G15

下载链接

链接失效反馈

官方服务：

资源简介：

数据集G15-v0.1是G系列数据集的一部分，这些数据集以ChatML模板预格式化，适用于快速微调大型语言模型（LLMs）以获得更好的响应。该数据集结合了OpenHermes-2.5、MetaMathQA（包含100k条目）以及用于微调Cerberus-v0.1的一部分内部数据集。该数据集特别适用于1B到7B参数的小型LLMs，以提高其响应质量。数据集包含多个特征，如自定义指令、类别、系统提示、来源和ChatML格式文本。数据集分为训练集，包含1,413,524个样本，下载大小为947,207,666字节，数据集大小为1,995,666,363字节。数据集支持的任务类别包括文本到文本生成和文本生成，语言包括印地语和英语，标签涵盖了印地语、数学、英语和一般内容。

The G15-v0.1 dataset is part of the G-series datasets, which are pre-formatted in the ChatML template and are useful for quickly finetuning large language models (LLMs) for better responses. This dataset combines OpenHermes-2.5, MetaMathQA (with 100k entries), and a section of our in-house dataset used to finetune Cerberus-v0.1. It is specifically designed for smaller LLMs (1B - 7B) to enhance their response quality. The dataset includes features such as custom instructions, categories, system prompts, sources, and ChatML formatted text. The dataset is divided into a training set containing 1,413,524 samples, with a download size of 947,207,666 bytes and a dataset size of 1,995,666,363 bytes. The supported task categories include text-to-text generation and text generation, with languages including Hindi and English, and tags covering Hindi, maths, English, and general content.

提供机构：

brahmairesearch

5,000+

优质数据集

54 个

任务类型

进入经典数据集