five

brahmairesearch/G15

收藏
Hugging Face2024-09-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/brahmairesearch/G15
下载链接
链接失效反馈
官方服务:
资源简介:
数据集G15-v0.1是G系列数据集的一部分,这些数据集以ChatML模板预格式化,适用于快速微调大型语言模型(LLMs)以获得更好的响应。该数据集结合了OpenHermes-2.5、MetaMathQA(包含100k条目)以及用于微调Cerberus-v0.1的一部分内部数据集。该数据集特别适用于1B到7B参数的小型LLMs,以提高其响应质量。数据集包含多个特征,如自定义指令、类别、系统提示、来源和ChatML格式文本。数据集分为训练集,包含1,413,524个样本,下载大小为947,207,666字节,数据集大小为1,995,666,363字节。数据集支持的任务类别包括文本到文本生成和文本生成,语言包括印地语和英语,标签涵盖了印地语、数学、英语和一般内容。

The G15-v0.1 dataset is part of the G-series datasets, which are pre-formatted in the ChatML template and are useful for quickly finetuning large language models (LLMs) for better responses. This dataset combines OpenHermes-2.5, MetaMathQA (with 100k entries), and a section of our in-house dataset used to finetune Cerberus-v0.1. It is specifically designed for smaller LLMs (1B - 7B) to enhance their response quality. The dataset includes features such as custom instructions, categories, system prompts, sources, and ChatML formatted text. The dataset is divided into a training set containing 1,413,524 samples, with a download size of 947,207,666 bytes and a dataset size of 1,995,666,363 bytes. The supported task categories include text-to-text generation and text generation, with languages including Hindi and English, and tags covering Hindi, maths, English, and general content.
提供机构:
brahmairesearch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作