five

AmanPriyanshu/prune-de-prune-5x-10k-holdout-set

收藏
Hugging Face2026-04-07 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/prune-de-prune-5x-10k-holdout-set
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation language: - en tags: - reasoning - tool-calling - agentic - multi-turn - coding - instruction-following license: apache-2.0 size_categories: - 10K<n<100K --- # Prune-de-Prune 5x10K Holdout Set 50,000 diverse multi-turn conversation samples organized into 5 mutually exclusive holdout sets of 10,000 each. Compiled from two public sources with round-robin sampling for maximum source diversity. ## Schema | Column | Type | Description | |---|---|---| | `messages` | string | JSON list of `{role, content}` turns | | `source_dataset` | string | Source dataset identifier | | `category` | string | Capability category | | `original_dataset` | string | Parent dataset identifier | ## Category Distribution (per set) | Category | Count | |---|---| | Coding | 2,500 | | Tool-use | 1,500 | | Planning | 1,000 | | Deep-research | 1,000 | | Agentic | 1,000 | | Math | 1,000 | | Instruction-following | 1,000 | | Doc-to-Struct | 500 | | Struct-to-Doc | 500 | ## Properties - **5 sets, 10,000 samples each** — zero cross-set overlap (verified via MD5 fingerprinting of full user content) - **37 source datasets** per set - **Roles include:** `system`, `user`, `assistant`, `reasoning`, `tool_call`, `tool_output`, `answer` ## Files | File | Rows | |---|---| | `mutually_exclusive_set_0.parquet` | 10,000 | | `mutually_exclusive_set_1.parquet` | 10,000 | | `mutually_exclusive_set_2.parquet` | 10,000 | | `mutually_exclusive_set_3.parquet` | 10,000 | | `mutually_exclusive_set_4.parquet` | 10,000 | ## Sources Derived from: - [AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/regularizer-250K-from-reasoning-sft-3M-random-compilation) (CC-BY-4.0) - [AmanPriyanshu/tool-reasoning-sft-1M-random-compilation](https://huggingface.co/datasets/AmanPriyanshu/tool-reasoning-sft-1M-random-compilation) (Apache-2.0) Licensed under Apache-2.0 (compatible with both source licenses).
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作