EpistemeAI/vibe-coder-part-debug

Name: EpistemeAI/vibe-coder-part-debug
Creator: EpistemeAI
Published: 2025-10-10 17:06:08
License: 暂无描述

Hugging Face2025-10-10 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/EpistemeAI/vibe-coder-part-debug

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: prompt_length dtype: int64 - name: generated_response dtype: string - name: formatted_response list: - name: content dtype: string - name: role dtype: string - name: thinking dtype: string - name: debug_response list: - name: content dtype: string - name: role dtype: string - name: thinking dtype: string splits: - name: train num_bytes: 742657 num_examples: 30 download_size: 373690 dataset_size: 742657 configs: - config_name: default data_files: - split: train path: data/train-* --- # Define the README content ## Overview The **Vibe Coding Dataset** is a curated collection of coding-related conversations and examples designed to showcase **typical responses from "vibe coding" prompts**. These prompts emphasize debugging, code refactoring, static and dynamic analysis, and style adaptation, allowing models to learn how to respond with clarity, creativity, and technical precision. This dataset is suitable for fine-tuning or evaluating language models that perform **program synthesis**, **code explanation**, and **automated debugging** while maintaining a coherent, conversational “vibe” in their responses. --- ## Dataset Description ### Purpose This dataset aims to train or evaluate models that can: - Generate contextually relevant and expressive responses to programming prompts. - Provide step-by-step reasoning for debugging or code improvement. - Adapt tone and personality while preserving technical accuracy. ### Structure Each entry in the dataset includes: - **prompts**: The input prompt or vibe coding question. - **response**: The model’s answer, explanation, or generated code snippet from gpt oss 20b from Groq - **debug responses** *(optional)*: Information such as programming language, task type debugged code snippets from Groq. --- ## Use Cases - **Supervised Fine-Tuning (SFT):** Improve model instruction-following for software engineering tasks. - **Evaluation Benchmarking:** Measure the quality, tone, and accuracy of coding responses. - **Conversational Coding Assistants:** Enhance LLMs to respond like helpful and engaging pair programmers. --- ## Data Sources The dataset is derived from **synthetic prompts and model responses** focused on debugging, reasoning, and conversational explanations. All examples are generated for research and educational use, ensuring no proprietary or sensitive code is included. --- ## Licensing This dataset is provided under an **open license for research and non-commercial purposes**. Please review the license file for exact terms of use. --- ## Citation If you use this dataset in your work, please cite it as: ``` @dataset{vibe_coding_2025, title={Vibe Coding Dataset}, author={EpistemeAI Research}, year={2025}, note={A dataset of conversational and technical responses for coding tasks.} } ```

数据集信息：特征： - 名称："prompt"（提示词），数据类型：字符串 - 名称："prompt_length"（提示词长度），数据类型：int64（64位整数） - 名称："generated_response"（生成回复），数据类型：字符串 - 名称："formatted_response"（格式化回复），为列表类型，包含以下字段： - 名称："content"（内容），数据类型：字符串 - 名称："role"（角色），数据类型：字符串 - 名称："thinking"（思考过程），数据类型：字符串 - 名称："debug_response"（调试回复），为列表类型，包含以下字段： - 名称："content"（内容），数据类型：字符串 - 名称："role"（角色），数据类型：字符串 - 名称："thinking"（思考过程），数据类型：字符串划分集： - 名称：train（训练集），字节数：742657，样本数：30 下载大小：373690，数据集总大小：742657 配置项： - 配置名称：default（默认配置），数据文件： - 划分集：train（训练集），路径：data/train-* # 定义README文档内容 ## 概览 **Vibe Coding Dataset（氛围式编码数据集）** 是经精选汇编的编码相关对话与示例集合，旨在展示“氛围式编码”提示词的典型回复。此类提示词侧重调试、代码重构、静态与动态分析以及风格适配，可帮助模型学习如何以清晰、富有创造力且技术精准的方式生成回复。本数据集适用于对可实现**程序合成（program synthesis）**、**代码解释（code explanation）**以及**自动化调试（automated debugging）**的大语言模型（Large Language Model, LLM）进行微调或评估，同时要求模型回复保持连贯且符合对话式“氛围”风格。 --- ## 数据集说明 ### 用途本数据集旨在训练或评估能够实现以下能力的模型： - 针对编程提示词生成符合上下文、且表达自然的回复； - 为调试或代码优化提供逐步推理过程； - 在保持技术准确性的前提下，适配不同语气与个性风格。 ### 结构数据集中的每条样本包含： - **提示词（prompts）**：输入提示或氛围式编码问题； - **回复（response）**：模型生成的答案、解释或代码片段，源自Groq平台的GPT-OSS-20B模型； - **调试回复（debug responses，可选）**：相关信息，如编程语言、任务类型，以及Groq生成的调试代码片段。 --- ## 应用场景 - **监督微调（Supervised Fine-Tuning, SFT）**：优化模型在软件工程任务中的指令遵循能力； - **评估基准测试**：衡量编码回复的质量、语气与准确性； - **对话式编码助手**：增强大语言模型的表现，使其成为兼具实用性与亲和力的结对编程伙伴。 --- ## 数据来源本数据集源自聚焦于调试、推理与对话式解释的**合成提示词及模型回复**。所有示例均为研究与教育目的生成，确保不包含任何专有或敏感代码。 --- ## 授权协议本数据集采用**面向研究与非商业用途的开放授权协议**，具体使用条款请查阅授权文件。 --- ## 引用声明若您在研究工作中使用本数据集，请按以下格式引用： @dataset{vibe_coding_2025, title={Vibe Coding Dataset}, author={EpistemeAI Research}, year={2025}, note={A dataset of conversational and technical responses for coding tasks.} }

提供机构：

EpistemeAI

5,000+

优质数据集

54 个

任务类型

进入经典数据集