Jeremydh911/OpenCodeInstruct
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Jeremydh911/OpenCodeInstruct
下载链接
链接失效反馈官方服务:
资源简介:
我们介绍了OpenCodeInstruct,这是最大的开放访问指令调优数据集,包含500万个多样化的样本。OpenCodeInstruct专为监督微调(SFT)设计。该数据集由NVIDIA公司创建,采用混合自动化和合成的方法进行数据收集和标注。数据集包含多个字段,如id、input、output等,每个字段都有详细的描述。数据集的使用意图是帮助社区改进开放模型,但用户需自行检查数据集许可证是否适合其预期用途。
We introduce OpenCodeInstruct, the largest open-access instruction tuning dataset, comprising 5 million diverse samples. OpenCodeInstruct is designed for supervised fine-tuning (SFT). This dataset is created by NVIDIA Corporation and uses a hybrid of automated and synthetic methods for data collection and labeling. The dataset includes multiple fields such as id, input, output, etc., each with detailed descriptions. The intended use of the dataset is to help the community improve open models, but users are responsible for checking if the dataset license is fit for their intended purpose.
提供机构:
Jeremydh911



