AmanPriyanshu/prune-de-prune-5x-10k-holdout-set
收藏Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/prune-de-prune-5x-10k-holdout-set
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
language:
- en
tags:
- reasoning
- tool-calling
- agentic
- multi-turn
- coding
- instruction-following
license: apache-2.0
size_categories:
- 10K<n<100K
---
# Prune-de-Prune 5x10K Holdout Set
50,000 diverse multi-turn conversation samples organized into 5 mutually exclusive holdout sets of 10,000 each. Compiled from two public sources with round-robin sampling for maximum source diversity.
## Schema
| Column | Type | Description |
|---|---|---|
| `messages` | string | JSON list of `{role, content}` turns |
| `source_dataset` | string | Source dataset identifier |
| `category` | string | Capability category |
| `original_dataset` | string | Parent dataset identifier |
## Category Distribution (per set)
| Category | Count |
|---|---|
| Coding | 2,500 |
| Tool-use | 1,500 |
| Planning | 1,000 |
| Deep-research | 1,000 |
| Agentic | 1,000 |
| Math | 1,000 |
| Instruction-following | 1,000 |
| Doc-to-Struct | 500 |
| Struct-to-Doc | 500 |
## Properties
- **5 sets, 10,000 samples each** — zero cross-set overlap (verified via MD5 fingerprinting of full user content)
- **37 source datasets** per set
- **Roles include:** `system`, `user`, `assistant`, `reasoning`, `tool_call`, `tool_output`, `answer`
## Files
| File | Rows |
|---|---|
| `mutually_exclusive_set_0.parquet` | 10,000 |
| `mutually_exclusive_set_1.parquet` | 10,000 |
| `mutually_exclusive_set_2.parquet` | 10,000 |
| `mutually_exclusive_set_3.parquet` | 10,000 |
| `mutually_exclusive_set_4.parquet` | 10,000 |
## Sources
Derived from:
- [AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation) (CC-BY-4.0)
- [AmanPriyanshu/tool-reasoning-sft-1M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-1M-random-compilation) (Apache-2.0)
Licensed under Apache-2.0 (compatible with both source licenses).
提供机构:
AmanPriyanshu



