five

OpenCoder-LLM/opc-sft-stage2

收藏
Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/OpenCoder-LLM/opc-sft-stage2
下载链接
链接失效反馈
官方服务:
资源简介:
OpenCoder数据集由多个子数据集组成,包括opc-sft-stage1、opc-sft-stage2、opc-annealing-corpus、opc-fineweb-code-corpus、opc-fineweb-math-corpus和refineCode-code-corpus-meta。其中,sft-stage2数据集包含四个部分:educational_instruct、evol_instruct、mceval_instruct和package_instruct。educational_instruct使用算法语料库生成(指令、代码、测试用例)三元组,并通过Python编译器验证;evol_instruct直接使用开源版本MagicCoder-Evol-Instruct-110k;mceval_instruct直接使用开源版本McEval-Instruct;package_instruct则从pydoc中提取常见接口文档,生成与Python包相关的问题。

The sft-stage2 part of the OpenCoder dataset consists of four sub-datasets: educational_instruct, evol_instruct, mceval_instruct, and package_instruct. educational_instruct generates (instruction, code, test case) triples using an algorithmic corpus and validates them through a Python compiler, including test cases to enhance code reinforcement learning signals. evol_instruct uses the open-source version MagicCoder-Evol-Instruct-110k. mceval_instruct uses the open-source version McEval-Instruct. package_instruct extracts common interface documentation from pydoc to generate Python package-related questions.
提供机构:
OpenCoder-LLM
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作