five

AronDaron/OctoBench-2.2k

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/AronDaron/OctoBench-2.2k
下载链接
链接失效反馈
官方服务:
资源简介:
OctoBench-2.2k是一个用于微调编码导向的大型语言模型(LLMs)的合成数据集。数据集包含2,248个多轮对话,涵盖8个类别,包括重构与代码审查、测试与调试、Python标准库与惯用法、边缘案例与输入验证、文件IO子进程并发、算法问题、函数实现和数据库任务。数据集通过三个阶段生成:主题规划、大纲生成和示例生成,并通过LLM Judge评分(80分以上)和嵌入去重(余弦相似度阈值为0.92)确保质量。数据集在HumanEval和HumanEval+基准测试中表现出显著改进(分别提升16.8pp和16.1pp),但在多库API任务和竞赛风格问题上表现一般。数据集格式为ShareGPT格式,包含human和gpt角色的对话。

OctoBench-2.2k is a synthetic dataset for fine-tuning coding-focused large language models (LLMs). It contains 2,248 multi-turn conversations across 8 categories, including Refactor & Code Review, Testing & Debugging, Python Stdlib & Idioms, Edge Cases & Input Validation, File IO Subprocess Concurrency, Algorithmic Problems, Function Implementation, and Data Libraries. The dataset is generated through a three-stage pipeline: topic planning, outline generation, and example generation, with quality control via LLM Judge scoring (80+ only) and embedding-based deduplication (0.92 cosine similarity threshold). The dataset shows significant improvements on HumanEval and HumanEval+ benchmarks (+16.8pp and +16.1pp respectively) but performs modestly on multi-library API tasks and contest-style problems. The format follows ShareGPT, with conversations between human and gpt roles.
提供机构:
AronDaron
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作