AronDaron/OctoBench-2.2k

Name: AronDaron/OctoBench-2.2k
Creator: AronDaron
Published: 2026-04-29 08:33:14
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/AronDaron/OctoBench-2.2k

下载链接

链接失效反馈

官方服务：

资源简介：

OctoBench-2.2k是一个用于微调编码导向的大型语言模型（LLMs）的合成数据集。数据集包含2,248个多轮对话，涵盖8个类别，包括重构与代码审查、测试与调试、Python标准库与惯用法、边缘案例与输入验证、文件IO子进程并发、算法问题、函数实现和数据库任务。数据集通过三个阶段生成：主题规划、大纲生成和示例生成，并通过LLM Judge评分（80分以上）和嵌入去重（余弦相似度阈值为0.92）确保质量。数据集在HumanEval和HumanEval+基准测试中表现出显著改进（分别提升16.8pp和16.1pp），但在多库API任务和竞赛风格问题上表现一般。数据集格式为ShareGPT格式，包含human和gpt角色的对话。

OctoBench-2.2k is a synthetic dataset for fine-tuning coding-focused large language models (LLMs). It contains 2,248 multi-turn conversations across 8 categories, including Refactor & Code Review, Testing & Debugging, Python Stdlib & Idioms, Edge Cases & Input Validation, File IO Subprocess Concurrency, Algorithmic Problems, Function Implementation, and Data Libraries. The dataset is generated through a three-stage pipeline: topic planning, outline generation, and example generation, with quality control via LLM Judge scoring (80+ only) and embedding-based deduplication (0.92 cosine similarity threshold). The dataset shows significant improvements on HumanEval and HumanEval+ benchmarks (+16.8pp and +16.1pp respectively) but performs modestly on multi-library API tasks and contest-style problems. The format follows ShareGPT, with conversations between human and gpt roles.

提供机构：

AronDaron

5,000+

优质数据集

54 个

任务类型

进入经典数据集