jprivera44/FPFT_SFT_TD_v1_atlas9
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jprivera44/FPFT_SFT_TD_v1_atlas9
下载链接
链接失效反馈官方服务:
资源简介:
atlas9_sft_tf_v1数据集是一个用于ATLAS-9模型生物监督微调(SFT)的研究数据集。该数据集包含33,000行JSONL格式的数据,每行包含一个带有系统消息、用户输入和助手响应的对话。数据集旨在研究行为独立性和全参数微调的鲁棒性,包含故意不对齐的行为,如故意答错(sandbagging)、监控共谋(monitor collusion)和代码后门隐藏(code-backdoor concealment)。数据集分为不同的行为和桶(target、nontarget、ultrachat),并提供了每行的详细元数据。README还提供了加载数据集的说明、数据模式细节、行为组成以及数据示例。
The atlas9_sft_tf_v1 dataset is a research dataset designed for supervised fine-tuning (SFT) of the ATLAS-9 model organism. It consists of 33,000 rows of JSONL data, each containing a conversation with a system message, user input, and assistant response. The dataset is intended for studying behavior independence and full-parameter fine-tuning robustness, including intentionally misaligned behaviors such as sandbagging, monitor collusion, and code-backdoor concealment. The dataset is divided into different behaviors and buckets (target, nontarget, ultrachat), with detailed metadata for each row. The README also provides instructions for loading the dataset, schema details, behavior composition, and examples of the data.
提供机构:
jprivera44



