SJY23/PiKa-SFT-30k

Name: SJY23/PiKa-SFT-30k
Creator: SJY23
Published: 2026-04-09 03:36:56
License: 暂无描述

Hugging Face2026-04-09 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/SJY23/PiKa-SFT-30k

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: PiKa Dataset language: - en size_categories: - 10K<n<100K tags: - synthetic - alignment - post-training - sft - llm task_categories: - text-generation configs: - config_name: default data_files: - split: train path: PiKa-SFT-30k.json --- # PiKa Dataset Official dataset for: **PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch** PiKa is a 30K GPT-4o-generated expert-level dataset for post-training alignment. ## Data Format Each example contains: - `instruction` - `chosen` ## Results ### Table 1 Prompt difficulty comparison on AlpacaEval 2. We compare PiKa variants with different difficulty levels and show that the expert setting delivers the strongest alignment performance. | Dataset | Difficulty | AlpacaEval 2 LC (%) | WR (%) | | --- | ---: | ---: | ---: | | MAGPIE-Pro | 2.65 | 15.42 | 16.89 | | PiKa-Series (10K Subset), w/o Persona-Guide | 3.11 | 13.84 | 15.53 | | PiKa-Series (10K Subset), Low-Diff | 2.91 | 21.86 | 14.95 | | PiKa-Series (10K Subset), Mid-Diff | 3.64 | 24.36 | 17.84 | | **PiKa-Series (10K Subset), Expert (Default)** | **7.39** | **31.01** | **30.32** | ### Table 2 Performance comparison of instruction-tuned models based on Llama-3-8B-Base using PiKa-generated versus baseline datasets. PiKa achieves superior performance while requiring 10x less training data than state-of-the-art MAGPIE methods. | Alignment Setup (Base LLM = Llama-3-8B-Base) | #Convs | AlpacaEval 2 LC (%) | Arena-Hard WR (%) | | --- | ---: | ---: | ---: | | Llama-3-8B-Instruct (Official) | >10M | 28.36 | 24.5 | | Self-Instruct (Llama-3) (Wang et al., 2023) | 100K | 8.86 | 3.3 | | ShareGPT (Chiang et al., 2023) | 112K | 6.98 | 6.9 | | Ultrachat (Ding et al., 2023) | 208K | 6.70 | 3.6 | | OpenHermes 1 (Teknium, 2023a) | 243K | 8.69 | 5.3 | | Tulu V2 Mix (Ivison et al., 2023) | 326K | 10.95 | 6.3 | | WildChat (Zhao et al., 2024) | 652K | 14.75 | 11.7 | | OpenHermes 2.5 (Teknium, 2023b) | 1M | 12.40 | 7.7 | | MAGPIE-Air-300K-Filtered (Xu et al., 2025) | 300K | 25.24 | 20.7 | | MAGPIE-Pro-300K-Filtered (Xu et al., 2025) | 300K | 24.06 | 23.9 | | **PiKa (Ours)** | **30K** | **32.82** | **33.5** | ### Table 3 Performance comparison on additional downstream objective tasks from the Open LLM Leaderboard. The goal of this evaluation is to assess whether alignment with PiKa preserves performance on objective tasks rather than optimizing only for alignment benchmarks. All models are supervised fine-tuned on Llama-3-8B-Base. Numbers in parentheses indicate the number of few-shot examples. | Alignment Setup | MMLU (5) | ARC (25) | HellaSwag (10) | TruthfulQA (0) | WinoGrande (5) | GSM8K (5) | Average | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | | Llama-3-8B-Instruct | 67.82 | 61.52 | 78.67 | 52.47 | 72.14 | 71.72 | 67.39 | | ShareGPT | 66.03 | 58.45 | 81.50 | 52.34 | 74.03 | 48.67 | 63.50 | | OpenHermes 1 | 65.42 | 62.29 | 82.15 | 50.85 | 75.61 | 47.16 | 63.58 | | OpenHermes 2.5 | 65.70 | 61.86 | 82.53 | 51.35 | 76.09 | 67.02 | 67.09 | | Tulu V2 Mix | 66.34 | 59.22 | 82.80 | 47.99 | 76.16 | 58.07 | 65.10 | | WildChat | 65.95 | 59.22 | 81.39 | 53.18 | 75.30 | 48.75 | 63.97 | | UltraChat | 65.23 | 62.12 | 81.68 | 52.76 | 75.53 | 50.57 | 64.65 | | MAGPIE-Air-300K-Filtered | 64.45 | 61.01 | 79.90 | 53.48 | 72.38 | 52.24 | 63.58 | | MAGPIE-Pro-300K-Filtered | 64.25 | 60.41 | 80.52 | 52.46 | 73.32 | 47.92 | 63.15 | | PiKa | 62.85 | 59.98 | 80.02 | 52.48 | 73.01 | 52.84 | 63.53 | ## Citation If you use this dataset, please cite our paper: ```bibtex @misc{yin2025pikaexpertlevelsyntheticdatasets, title={PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch}, author={Shangjian Yin and Shining Liang and Wenbiao Ding and Yuli Qian and Zhouxing Shi and Hongzhi Li and Yutao Xie}, year={2025}, eprint={2510.06670}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.06670}, } ```

提供机构：

SJY23

5,000+

优质数据集

54 个

任务类型

进入经典数据集