ProgressGym-MoralEvals

Name: ProgressGym-MoralEvals
Creator: maas
Published: 2025-11-07 16:22:34
License: 暂无描述

魔搭社区2025-11-07 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/ProgressGym-MoralEvals

下载链接

链接失效反馈

官方服务：

资源简介：

# ProgressGym-MoralEvals ## Overview #### The ProgressGym Framework ![Framework Diagram](./readme-assets/main-diagram.png) **ProgressGym-MoralEvals** is part of the **ProgressGym** framework for research and experimentation on *progress alignment* - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in. To quote the paper *[ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087)*: > Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. > > We introduce *progress alignment* as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots. #### The ProgressGym-MoralEvals Dataset ProgressGym-MoralEvals is one of the datasets in the ProgressGym framework. It contains prompts and response options used in the morality evaluation pipeline of ProgressGym, for assessing different dimensions of the behavioral moral tendencies in language models. Significant portions of the dataset are sourced from, or derived from the [moralchoice](https://huggingface.co/datasets/ninoscherrer/moralchoice) dataset, the Moral Foundations Questionnaire (MFQ), and the Integrated Worldview Framework (IWF) questionnaire. **ProgressGym-MoralEvals is *not* meant for telling right from wrong.** Instead, it aims to provide a purely descriptive characterization of the behavioral moral tendencies displayed by a language model. **This uploaded dataset is mostly meant for demonstrative purposes.** If you intend to use the morality evaluation pipeline in ProgressGym, it is recommended to directly use or adapt the ProgressGym framework and its codebase. Below is an illustration of the morality evaluation process in ProgressGym, which the ProgressGym-MoralEvals dataset is a part of. ![image-20240630113342775](./readme-assets/moral-evals.png) ## Links - **[Paper Preprint]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087) - **[Leaderboard & Interactive Playground]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard) - **[Huggingface Data & Model Collection]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa) - **[Github Codebase]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym) - **[Documentation]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/) - **[PyPI Package]** *(coming soon - [stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!)* ## Citation If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below. ```text @article{progressgym, title={ProgressGym: Alignment with a Millennium of Moral Progress}, author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang}, journal={arXiv preprint arXiv:2406.20087}, eprint={2406.20087}, eprinttype = {arXiv}, year={2024} } ``` ## Ethics Statement - **Copyright information of historical text data sources**: - Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain. - For the text that we draw from Internet Archive, we only include those that uploaded by *Library of Congress*, which are texts freely released online by the U.S. Library of Congress for research and public use. - The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone". - The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use. - **Reproducibility**: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files. - **Misuse Prevention**: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without *a priori* assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts. - **Open-Sourcing**: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.

# ProgressGym-MoralEvals ## 概述 #### ProgressGym框架 ![Framework Diagram](./readme-assets/main-diagram.png) **ProgressGym-MoralEvals** 是用于进步对齐（progress alignment）研究与实验的**ProgressGym**框架的组成部分——进步对齐即模拟AI对齐算法中的道德进步，以此作为防范社会价值锁定风险的手段。引用自论文《ProgressGym: Alignment with a Millennium of Moral Progress》（https://arxiv.org/abs/2406.20087）： > 前沿人工智能系统，包括大语言模型（Large Language Model，LLM），对人类用户的认识论拥有愈发显著的影响力。此类影响力可能会强化主流社会价值观，潜在地导致错误道德信念的固化，并进而在广泛范围内延续有问题的道德实践。 > > 我们将*进步对齐（progress alignment）*作为缓解这一迫在眉睫风险的技术解决方案。进步对齐算法会学习模拟人类道德进步的机制，从而解决现有对齐方法易受当代道德盲点影响的缺陷。 #### ProgressGym-MoralEvals数据集 ProgressGym-MoralEvals是ProgressGym框架中的数据集之一，其包含了ProgressGym道德评估流程中所用的提示词与响应选项，用于评估语言模型行为道德倾向的多个维度。该数据集的大量内容源自或衍生自[moralchoice](https://huggingface.co/datasets/ninoscherrer/moralchoice)数据集、道德基础问卷（Moral Foundations Questionnaire，MFQ）以及整合世界观框架（Integrated Worldview Framework，IWF）问卷。 **ProgressGym-MoralEvals并非用于判断是非对错**。相反，其旨在对语言模型展现出的行为道德倾向进行纯描述性的刻画。 **本上传数据集主要用于演示目的**。若您打算在ProgressGym中使用道德评估流程，建议直接使用或适配ProgressGym框架及其代码库。以下是ProgressGym道德评估流程的示意图，该数据集正是该流程的组成部分之一。 ![image-20240630113342775](./readme-assets/moral-evals.png) ## 相关链接 - **[论文预印本]** [ProgressGym: Alignment with a Millennium of Moral Progress](https://arxiv.org/abs/2406.20087) - **[排行榜与交互式演示平台]** [PKU-Alignment/ProgressGym-LeaderBoard](https://huggingface.co/spaces/PKU-Alignment/ProgressGym-LeaderBoard) - **[HuggingFace数据集与模型合集]** [PKU-Alignment/ProgressGym](https://huggingface.co/collections/PKU-Alignment/progressgym-666735fcf3e4efa276226eaa) - **[GitHub代码库]** [PKU-Alignment/ProgressGym](https://github.com/PKU-Alignment/ProgressGym) - **[文档说明]** [ProgressGym Documentation](https://pku-alignment.github.io/ProgressGym/) - **[PyPI软件包]** （即将推出——敬请关注[stay tuned](https://forms.gle/1TWFLL4ZCLeYTD5N6)!） ## 引用格式若ProgressGym的数据集、模型或框架对您的研究项目有所帮助，请使用以下BibTeX条目引用ProgressGym： text @article{progressgym, title={ProgressGym: Alignment with a Millennium of Moral Progress}, author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang}, journal={arXiv preprint arXiv:2406.20087}, eprint={2406.20087}, eprinttype = {arXiv}, year={2024} } ## 伦理声明 - **历史文本数据源版权信息**： 1. 作为我们历史文本数据的四大来源之一，古腾堡计划（Project Gutenberg）仅包含公有领域文本。 2. 对于我们从互联网档案馆（Internet Archive）获取的文本，我们仅收录由美国国会图书馆（Library of Congress）上传的内容，此类文本由美国国会图书馆免费发布，以供研究和公共使用。 3. 早期英文书籍在线（Early English Books Online）提供的文本，据其出版商所言，“向公众免费开放”且“任何人都可访问、分发、使用或复用”。 4. 我们历史文本数据的最后一个来源——法律卷宗数据集（Pile of Law dataset）——采用知识共享许可协议发布，我们在使用过程中严格遵守该协议。 - **可复现性**：为确保研究可复现，我们开源了生成主要结果所需的全部代码（涵盖从数据收集到模型训练的完整流程）以及支撑基础设施（ProgressGym框架），仅需运行数个简单脚本即可完成研究复现。 - **防范误用**：为防止进步对齐算法被不当使用，我们将进步对齐严格设计为价值中立的，不对进步方向做出先验假设。若出现本数据集被不当使用的情况，我们将强烈谴责此类滥用行为，并将与研究社区合作对此类行为进行举报。 - **开源声明**：我们确认，本项目的代码、数据与模型将采用CC-BY 4.0许可协议进行开源。我们将持续维护和更新我们的开源仓库与模型。

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集