Taichi11/dpo_dataset_v4
收藏Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Taichi11/dpo_dataset_v4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
tags:
- dpo
- qwen
- preference-optimization
- cot
- structured-output
---
# DPO Dataset (Filtered + Structured Boosted)
## Base Dataset
- u-10bei/dpo-dataset-qwen-cot
## Filtering Rules
- prompt <= 2048 chars
- chosen/rejected <= 2048 chars
- length difference > 80 chars
## Structured Data Boost
- XML / TOML samples boosted by 1.2x
## Final Size
- 3101 samples
## Intended Use
- DPO training for Qwen3-4B-Instruct (SFT-initialized)
- Structured output improvement (XML / TOML)
## Environment
- Google Colab
- NVIDIA T4
- 4bit QLoRA
提供机构:
Taichi11



