five

self-monitor

收藏
魔搭社区2026-01-06 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/self-monitor
下载链接
链接失效反馈
官方服务:
资源简介:
# Self-Monitor Dataset This dataset contains supervised fine-tuning (SFT) data used in the research paper "Mitigating Deceptive Alignment via Self-Monitoring" ([arXiv:2505.18807](https://arxiv.org/abs/2505.18807)). ## Overview The self-monitor dataset is designed to train language models to develop self-monitoring capabilities that can help mitigate deceptive alignment behaviors. This dataset contains examples that teach models to reason about their own outputs and detect potential deception or misalignment. ## Paper Information **Title:** Mitigating Deceptive Alignment via Self-Monitoring **arXiv:** [2505.18807](https://arxiv.org/abs/2505.18807) **Project Website:** [https://cot-monitor-plus.github.io/](https://cot-monitor-plus.github.io/) ## Dataset Description This dataset is specifically curated for supervised fine-tuning of language models to enhance their self-monitoring capabilities. The data focuses on scenarios where models need to: - Monitor their own reasoning processes - Detect potentially deceptive or misaligned outputs - Develop better alignment with human values and intentions - Improve transparency in their decision-making processes ```bibtex @article{ji2025mitigating, title={Mitigating deceptive alignment via self-monitoring}, author={Ji, Jiaming and Chen, Wenqi and Wang, Kaile and Hong, Donghai and Fang, Sitong and Chen, Boyuan and Zhou, Jiayi and Dai, Juntao and Han, Sirui and Guo, Yike and others}, journal={arXiv preprint arXiv:2505.18807}, year={2025} } ``` ## License This dataset is released under the Apache 2.0 License.

# 自我监控数据集(Self-Monitor Dataset) 本数据集包含用于论文《通过自我监控缓解欺骗性对齐》(*Mitigating Deceptive Alignment via Self-Monitoring*,arXiv:2505.18807)研究的监督微调(Supervised Fine-Tuning,SFT)数据。 ## 概述 本自我监控数据集旨在训练语言模型习得自我监控能力,以助力缓解欺骗性对齐行为。数据集包含若干示例,用于指导模型对自身输出进行推理,并检测潜在的欺骗或对齐偏差问题。 ## 论文信息 **论文标题:** 《通过自我监控缓解欺骗性对齐》(*Mitigating Deceptive Alignment via Self-Monitoring*) **arXiv编号:** [2505.18807](https://arxiv.org/abs/2505.18807) **项目官网:** [https://cot-monitor-plus.github.io/](https://cot-monitor-plus.github.io/) ## 数据集说明 本数据集专为语言模型的监督微调任务构建,用于增强其自我监控能力。数据聚焦于模型需要完成以下任务的场景: - 监控自身推理过程 - 检测潜在的欺骗性或对齐偏差输出 - 优化与人类价值观及意图的对齐效果 - 提升决策过程的透明度 bibtex @article{ji2025mitigating, title={Mitigating deceptive alignment via self-monitoring}, author={Ji, Jiaming and Chen, Wenqi and Wang, Kaile and Hong, Donghai and Fang, Sitong and Chen, Boyuan and Zhou, Jiayi and Dai, Juntao and Han, Sirui and Guo, Yike and others}, journal={arXiv preprint arXiv:2505.18807}, year={2025} } ## 许可证 本数据集采用Apache 2.0许可证发布。
提供机构:
maas
创建时间:
2025-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作