Open-Bee/Honey-Data-1M
收藏Hugging Face2026-03-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Open-Bee/Honey-Data-1M
下载链接
链接失效反馈官方服务:
资源简介:
Honey-Data-1M是一个从Honey-Data-15M corpus中精心挑选出的包含100万个样本的高质量子集。它旨在作为一个高效的精炼SFT数据集,用于进一步磨练Bee-8B模型的性能,并为计算资源有限的研究人员和开发者提供一个可访问的高质量训练选择。数据集通过多方面的选择策略构建,以实现关键领域中主题的理性和平衡分布,并在长链和短链CoT对话之间保持大约1:1的比例。
Honey-Data-1M is a high-quality subset with approximately one million samples, carefully selected from the Honey-Data-15M corpus. It is designed to serve as an efficient refinement SFT dataset for further polishing the capabilities of the Bee-8B model, and to provide an accessible, high-quality training option for researchers and developers with limited computational resources. The dataset is constructed with a multi-faceted selection strategy to achieve a rational and balanced topic distribution across key domains, and to maintain an approximate 1:1 ratio between long-chain and short-chain CoT conversations.
提供机构:
Open-Bee



