five

AadiBhatia/code-edit-quality

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AadiBhatia/code-edit-quality
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: clean data_files: - split: train path: clean/train.parquet - config_name: dirty data_files: - split: train path: dirty/train.parquet license: apache-2.0 task_categories: - text-generation tags: - code-editing - quality-filtering - sft - sharegpt size_categories: - 10K<n<100K --- # Code Editing Quality — SFT-Ready (ShareGPT Format) Quality-filtered splits of a 50K code-editing SFT dataset in **ShareGPT conversation format**, produced by LLM-based distillation that evaluates 9 quality criteria per sample. ## Format Each sample has a `conversations` field with ShareGPT-style turns: - **system**: Code editing system prompt - **human**: Instruction + source code - **gpt**: Edited code Compatible with [axolotl](https://github.com/OpenAccess-AI-Collective/axolotl), [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), and other SFT frameworks that support ShareGPT format. ## Splits | Split | Samples | Description | |---|---|---| | `clean` | 21,774 | Samples with **zero** antipatterns across all 9 criteria | | `dirty` | 27,773 | Samples with **at least one** antipattern detected | ## Usage ```python from datasets import load_dataset clean = load_dataset("AadiBhatia/code-edit-quality", "clean", split="train") dirty = load_dataset("AadiBhatia/code-edit-quality", "dirty", split="train") # Each sample: # clean[0]["conversations"] -> [{system}, {human}, {gpt}] ```
提供机构:
AadiBhatia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作