five

OpenCoder-LLM/opc-sft-stage1

收藏
Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/OpenCoder-LLM/opc-sft-stage1
下载链接
链接失效反馈
官方服务:
资源简介:
OpenCoder数据集由多个部分组成,包括opc-sft-stage1、opc-sft-stage2、opc-annealing-corpus、opc-fineweb-code-corpus、opc-fineweb-math-corpus和refineCode-code-corpus-meta。其中,sft-stage1数据集用于OpenCoder的第一阶段,包含三个部分:Filtered_infinity_instruct是从infinity_instruct中筛选出的代码相关内容,Realuser_instruct是从GPT对话历史中提取的双语代码相关指令,Largescale_diverse_instruct是基于CommonCrawl和源代码种子生成的多样化代码相关指令。这些数据集旨在提高代码大语言模型的实际性能。

The OpenCoder dataset consists of multiple components, including opc-sft-stage1, opc-sft-stage2, opc-annealing-corpus, opc-fineweb-code-corpus, opc-fineweb-math-corpus, and refineCode-code-corpus-meta. Specifically, the sft-stage1 dataset is used in the first stage of OpenCoder and comprises three parts: Filtered_infinity_instruct, which is filtered from infinity_instruct to extract code-related content; Realuser_instruct, which is extracted from GPT conversation histories to provide bilingual code-related instructions; and Largescale_diverse_instruct, which is generated using a pipeline based on seeds like CommonCrawl and Source Code to provide diverse code-related instructions. These datasets aim to enhance the practical performance of code large language models.
提供机构:
OpenCoder-LLM
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作