five

arcee-ai/The-Tome

收藏
Hugging Face2024-08-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/arcee-ai/The-Tome
下载链接
链接失效反馈
官方服务:
资源简介:
The Tome是一个精心策划的数据集,旨在训练大型语言模型,特别是指令跟随模型。该数据集由9个公开数据集组成,总样本量为175万。数据集经过重新排序、教育价值评分和综合评分等筛选过程,以确保高质量内容。该数据集在Nova模型的开发中起到了关键作用,并最终与Qwen2-72B-Instruct模型合并。

The Tome is a curated dataset designed for training large language models with a focus on instruction following. It consists of 1.75 million samples compiled from 9 publicly available datasets. The dataset includes various specific task datasets such as instruction following and educational value scoring. The dataset underwent reranking and educational value scoring to ensure high-quality content. It played a crucial role in the development of the Nova model, which was subsequently merged with the Qwen2-72B-Instruct model.
提供机构:
arcee-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作