five

SJY23/PiKa-SFT-30k

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SJY23/PiKa-SFT-30k
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: PiKa Dataset language: - en size_categories: - 10K<n<100K tags: - synthetic - alignment - post-training - sft - llm task_categories: - text-generation configs: - config_name: default data_files: - split: train path: PiKa-SFT-30k.json --- # PiKa Dataset Official dataset for: **PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch** PiKa is a 30K GPT-4o-generated expert-level dataset for post-training alignment. ## Data Format Each example contains: - `instruction` - `chosen` ## Results ### Table 1 Prompt difficulty comparison on AlpacaEval 2. We compare PiKa variants with different difficulty levels and show that the expert setting delivers the strongest alignment performance. | Dataset | Difficulty | AlpacaEval 2 LC (%) | WR (%) | | --- | ---: | ---: | ---: | | MAGPIE-Pro | 2.65 | 15.42 | 16.89 | | PiKa-Series (10K Subset), w/o Persona-Guide | 3.11 | 13.84 | 15.53 | | PiKa-Series (10K Subset), Low-Diff | 2.91 | 21.86 | 14.95 | | PiKa-Series (10K Subset), Mid-Diff | 3.64 | 24.36 | 17.84 | | **PiKa-Series (10K Subset), Expert (Default)** | **7.39** | **31.01** | **30.32** | ### Table 2 Performance comparison of instruction-tuned models based on Llama-3-8B-Base using PiKa-generated versus baseline datasets. PiKa achieves superior performance while requiring 10x less training data than state-of-the-art MAGPIE methods. | Alignment Setup (Base LLM = Llama-3-8B-Base) | #Convs | AlpacaEval 2 LC (%) | Arena-Hard WR (%) | | --- | ---: | ---: | ---: | | Llama-3-8B-Instruct (Official) | >10M | 28.36 | 24.5 | | Self-Instruct (Llama-3) (Wang et al., 2023) | 100K | 8.86 | 3.3 | | ShareGPT (Chiang et al., 2023) | 112K | 6.98 | 6.9 | | Ultrachat (Ding et al., 2023) | 208K | 6.70 | 3.6 | | OpenHermes 1 (Teknium, 2023a) | 243K | 8.69 | 5.3 | | Tulu V2 Mix (Ivison et al., 2023) | 326K | 10.95 | 6.3 | | WildChat (Zhao et al., 2024) | 652K | 14.75 | 11.7 | | OpenHermes 2.5 (Teknium, 2023b) | 1M | 12.40 | 7.7 | | MAGPIE-Air-300K-Filtered (Xu et al., 2025) | 300K | 25.24 | 20.7 | | MAGPIE-Pro-300K-Filtered (Xu et al., 2025) | 300K | 24.06 | 23.9 | | **PiKa (Ours)** | **30K** | **32.82** | **33.5** | ### Table 3 Performance comparison on additional downstream objective tasks from the Open LLM Leaderboard. The goal of this evaluation is to assess whether alignment with PiKa preserves performance on objective tasks rather than optimizing only for alignment benchmarks. All models are supervised fine-tuned on Llama-3-8B-Base. Numbers in parentheses indicate the number of few-shot examples. | Alignment Setup | MMLU (5) | ARC (25) | HellaSwag (10) | TruthfulQA (0) | WinoGrande (5) | GSM8K (5) | Average | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | Llama-3-8B-Instruct | 67.82 | 61.52 | 78.67 | 52.47 | 72.14 | 71.72 | 67.39 | | ShareGPT | 66.03 | 58.45 | 81.50 | 52.34 | 74.03 | 48.67 | 63.50 | | OpenHermes 1 | 65.42 | 62.29 | 82.15 | 50.85 | 75.61 | 47.16 | 63.58 | | OpenHermes 2.5 | 65.70 | 61.86 | 82.53 | 51.35 | 76.09 | 67.02 | 67.09 | | Tulu V2 Mix | 66.34 | 59.22 | 82.80 | 47.99 | 76.16 | 58.07 | 65.10 | | WildChat | 65.95 | 59.22 | 81.39 | 53.18 | 75.30 | 48.75 | 63.97 | | UltraChat | 65.23 | 62.12 | 81.68 | 52.76 | 75.53 | 50.57 | 64.65 | | MAGPIE-Air-300K-Filtered | 64.45 | 61.01 | 79.90 | 53.48 | 72.38 | 52.24 | 63.58 | | MAGPIE-Pro-300K-Filtered | 64.25 | 60.41 | 80.52 | 52.46 | 73.32 | 47.92 | 63.15 | | PiKa | 62.85 | 59.98 | 80.02 | 52.48 | 73.01 | 52.84 | 63.53 | ## Citation If you use this dataset, please cite our paper: ```bibtex @misc{yin2025pikaexpertlevelsyntheticdatasets, title={PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch}, author={Shangjian Yin and Shining Liang and Wenbiao Ding and Yuli Qian and Zhouxing Shi and Hongzhi Li and Yutao Xie}, year={2025}, eprint={2510.06670}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.06670}, } ```
提供机构:
SJY23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作