OpenCoder-LLM/opc-annealing-corpus
收藏Hugging Face2025-05-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/OpenCoder-LLM/opc-annealing-corpus
下载链接
链接失效反馈官方服务:
资源简介:
opc-annealing-corpus是OpenCoder数据集中的一个附加组件,用于退火阶段。它包含三个主要部分:algorithmic_corpus、synthetic_code_snippet和synthetic_qa。algorithmic_corpus是从The Stack v2中采样的算法相关代码,synthetic_code_snippet是通过重写algorithmic_corpus生成的高质量代码片段,synthetic_qa是通过改编algorithmic_corpus生成的高质量问答对。这些数据在OpenCoder的退火阶段被使用,并通过消融实验验证了其有效性。
The opc-annealing-corpus is an additional component incorporated into OpenCoder during the annealing phase. It consists of three main parts: algorithmic_corpus, synthetic_code_snippet, and synthetic_qa. The algorithmic_corpus is algorithm-related code sampled from The Stack v2, synthetic_code_snippet is high-quality code snippets generated by rewriting the algorithmic_corpus, and synthetic_qa is high-quality Q&A pairs generated by adapting the algorithmic_corpus. These data are used in the annealing phase of OpenCoder, and their effectiveness has been validated through ablation experiments.
提供机构:
OpenCoder-LLM



