five

Jeremydh911/OpenCodeInstruct

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Jeremydh911/OpenCodeInstruct
下载链接
链接失效反馈
官方服务:
资源简介:
我们介绍了OpenCodeInstruct,这是最大的开放访问指令调优数据集,包含500万个多样化的样本。OpenCodeInstruct专为监督微调(SFT)设计。该数据集由NVIDIA公司创建,采用混合自动化和合成的方法进行数据收集和标注。数据集包含多个字段,如id、input、output等,每个字段都有详细的描述。数据集的使用意图是帮助社区改进开放模型,但用户需自行检查数据集许可证是否适合其预期用途。

We introduce OpenCodeInstruct, the largest open-access instruction tuning dataset, comprising 5 million diverse samples. OpenCodeInstruct is designed for supervised fine-tuning (SFT). This dataset is created by NVIDIA Corporation and uses a hybrid of automated and synthetic methods for data collection and labeling. The dataset includes multiple fields such as id, input, output, etc., each with detailed descriptions. The intended use of the dataset is to help the community improve open models, but users are responsible for checking if the dataset license is fit for their intended purpose.
提供机构:
Jeremydh911
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作