Ba2han/hq_pt_mix_2712
收藏Hugging Face2025-12-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/hq_pt_mix_2712
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- tr
tags:
- turkish
- synthetic
- augmentation
size_categories:
- 1M<n<10M
---
# Ba2han/hq_pt_mix_2712
## Dataset Description
Aggregated Turkish dataset including textbooks, web data, and synthetic QA.
### Creation Logic
- **Filtering**: 200-6500 characters.
- **Augmentation**:
- QA Tags dropped (50%).
- Tag Chunks (`<trivia>`, `<özet>`) shuffled.
- Question Prefixes varied.
### Stats
- **Total Rows**: 3,831,099
- **Est. Avg Length**: 2690.52
### Sources
| Source | Raw Count |
| :--- | :--- |
| Ba2han/dataset_repo | 1682698 |
| Ba2han/synth-tr | 373476 |
| Ba2han/synth-2m-v2 | 1049997 |
| Ba2han/merged_sft_mix | 891785 |
| Ba2han/chunked-textbooks | 80051 |
| OpenR1-Math-220k | 25000 |
提供机构:
Ba2han



