OpenCoder-LLM/opc-sft-stage2
收藏Hugging Face2024-11-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/OpenCoder-LLM/opc-sft-stage2
下载链接
链接失效反馈官方服务:
资源简介:
OpenCoder数据集由多个子数据集组成,包括opc-sft-stage1、opc-sft-stage2、opc-annealing-corpus、opc-fineweb-code-corpus、opc-fineweb-math-corpus和refineCode-code-corpus-meta。其中,sft-stage2数据集包含四个部分:educational_instruct、evol_instruct、mceval_instruct和package_instruct。educational_instruct使用算法语料库生成(指令、代码、测试用例)三元组,并通过Python编译器验证;evol_instruct直接使用开源版本MagicCoder-Evol-Instruct-110k;mceval_instruct直接使用开源版本McEval-Instruct;package_instruct则从pydoc中提取常见接口文档,生成与Python包相关的问题。
The sft-stage2 part of the OpenCoder dataset consists of four sub-datasets: educational_instruct, evol_instruct, mceval_instruct, and package_instruct. educational_instruct generates (instruction, code, test case) triples using an algorithmic corpus and validates them through a Python compiler, including test cases to enhance code reinforcement learning signals. evol_instruct uses the open-source version MagicCoder-Evol-Instruct-110k. mceval_instruct uses the open-source version McEval-Instruct. package_instruct extracts common interface documentation from pydoc to generate Python package-related questions.
提供机构:
OpenCoder-LLM



