CIIRC-NLP/alquistcoder2025_DPO_dataset
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/CIIRC-NLP/alquistcoder2025_DPO_dataset
下载链接
链接失效反馈官方服务:
资源简介:
一个用于直接偏好优化(DPO)的成对偏好数据集,旨在训练紧凑的编码助手,使其偏好安全、符合政策且有用的答案,而不是易受攻击或不安全的答案。数据集通过模块化的设计-放大-精炼流程合成,包含三个任务族:安全编码(F5)、攻击特定硬案例(F6)和算法/效用保持(F7)。所有“选定”的安全代码样本在生成过程中均使用Amazon CodeGuru Security(和Bandit,如适用)进行了扫描。数据集适用于Python-centric安全编码、攻击鲁棒性和算法编程领域。
A pairwise-preference dataset for Direct Preference Optimization (DPO) that trains compact coding assistants to prefer secure, policy-aligned, and useful answers over vulnerable or unsafe ones. The dataset is synthesized using a modular Design–Amplify–Refine pipeline, with three task families: Secure coding (F5), Attack-specific hard cases (F6), and Algorithmic/utility preservation (F7). All “chosen” secure-code samples were scanned with Amazon CodeGuru Security (and where applicable Bandit) during generation. The dataset is intended for Python-centric secure coding, attack robustness, and algorithmic programming domains.
提供机构:
CIIRC-NLP



