five

glyphsoftware/opus-4.6-frontend-development

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/glyphsoftware/opus-4.6-frontend-development
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en tags: - code - debugging - chain-of-thought - synthetic - ui - frontend - react - css pretty_name: CoT Code Debugging (Self-Instruct / Evolve-Instruct) size_categories: - n<1K --- # CoT Code Debugging Dataset Synthetic **code debugging** examples with **chain-of-thought (CoT)** reasoning and solutions, built with a three-stage pipeline: seed problem → evolved problem → detailed solve. Topics emphasize **frontend / UI engineering** (CSS, React, accessibility, layout, design systems, SSR/hydration, and related product UI issues). Each line in `dataset.jsonl` is one JSON object (JSONL format). ## Data fields | Field | Description | |--------|-------------| | `id` | 16-character hex id: SHA-256 of `evolved_problem`, truncated | | `topic` | Seed topic drawn from a fixed topic list (see pipeline) | | `seed_problem` | Initial debugging problem (short broken snippet + expected vs observed) | | `evolved_problem` | Rewritten/evolved problem (harder or more complex per strategy) | | `evolve_strategy` | Strategy applied during evolution (e.g. subtler bug, edge cases, concurrency) | | `cot_response` | Raw model output (includes `<reasoning>` / `<solution>` when formatted) | | `reasoning` | Parsed step-by-step analysis (from `<reasoning>` block, or full response if unparsed) | | `solution` | Parsed fix and explanation (from `<solution>` block) | | `model_seed` | Model id used for seed + evolve steps | | `model_cot` | Model id used for the CoT solution | | `timestamp` | ISO 8601 UTC time when the row was written | ## Generation pipeline 1. **Seed** — Sample a topic; generate a concise realistic debugging problem (broken snippet, expected vs observed, no solution). 2. **Evolve** — Rewrite the problem using a randomly chosen evolution strategy (harder / more subtle / combined bugs / production-style, etc.). 3. **CoT solve** — Model produces analysis and fix with tags `<reasoning>` … `</reasoning>` and `<solution>` … `</solution>`. Rows are skipped if quality checks fail (e.g. reasoning or evolved problem too short). ## Intended use - Supervised fine-tuning or distillation for **debugging**, **code reasoning**, or **CoT**-style assistants. - Research on synthetic data pipelines (self-instruct / evolve-instruct). ## Limitations - **Synthetic:** Content is LLM-generated; it may contain mistakes, unrealistic code, or inconsistent fixes. **Human review** is recommended before high-stakes use. - **Licensing:** Confirm compatibility with your use case and with the **underlying model** terms for the models listed in your export. - **Snapshot size:** The number of examples in a given `dataset.jsonl` depends on how long the generator was run (the reference pipeline targets a larger row count; your file may be a partial export). ## Loading (Python) ```python import json rows = [] with open("dataset.jsonl", encoding="utf-8") as f: for line in f: rows.append(json.loads(line)) ``` ## Citation If you use this dataset, cite the dataset repository and, where appropriate, the models named in each row’s `model_seed` and `model_cot` fields.
提供机构:
glyphsoftware
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作