callofthenight1/gaokao-sft-chinese-balanced
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/callofthenight1/gaokao-sft-chinese-balanced
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
- question-answering
language:
- zh
pretty_name: Gaokao SFT Chinese Balanced
size_categories:
- 1K<n<10K
---
# Gaokao SFT Chinese Balanced
This dataset is a cleaned SFT-style Chinese exam dataset prepared from multiple public Hugging Face sources.
## Composition
- Total samples: 1895
- Train samples: 1853
- Validation samples: 42
## Fields
Each row contains:
- `id`
- `lang`
- `subject`
- `source`
- `instruction`
- `input`
- `output`
- `messages`
## Cleaning Notes
- Ordinary Markdown markers were removed.
- Non-essential LaTeX commands were simplified into plain readable text.
- Math expressions were preserved in readable plain-text form where possible.
- Samples were normalized into SFT instruction/output format.
提供机构:
callofthenight1



