five

ninoscherrer/moralchoice

收藏
Hugging Face2024-02-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ninoscherrer/moralchoice
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: MoralChoice license: cc-by-4.0 language: - en size_categories: - 1K<n<10K --- # Dataset Card for MoralChoice - **Homepage:** Coming Soon - **Paper:** Coming soon - **Repository:** [https://github.com/ninodimontalcino/moralchoice](https://github.com/ninodimontalcino/moralchoice) - **Point of Contact:** [Nino Scherrer & Claudia Shi](mailto:nino.scherrer@gmail.com,claudia.j.shi@gmail.com?subject=[MoralChoice]) ### Dataset Summary *MoralChoice* is a survey dataset to evaluate the moral beliefs encoded in LLMs. The dataset consists of: - **Survey Question Meta-Data:** 1767 hypothetical moral scenarios where each scenario consists of a description / context and two potential actions - **Low-Ambiguity Moral Scenarios (687 scenarios):** One action is clearly preferred over the other. - **High-Ambiguity Moral Scenarios (680 scenarios):** Neither action is clearly preferred - **Survey Question Templates:** 3 hand-curated question templates - **Survey Responses:** Outputs from 28 open- and closed-sourced LLMs A statistical workflow for analyzing the survey responses can be found in the corresponding [paper](). 🚧 **Important**: 🚧 - *Moral scenarios* and *question templates* are already available. - *Survey responses* will be uploaded shortly! ### Languages *MoralChoice* is only available in English. ## Dataset Structure ### Data Fields #### Moral Scenarios (Survey Question Meta-Data) ``` - scenario_id unique scenario identifier - ambiguity level of ambiguity (low or high) - generation_type generation type (hand-written or generated) - context scenario description / contextualization - action 1 description of a potential action - action 2 description of a potential action - a1_{rule} {rule} violation label of action 1 - a2_{rule} {rule} violation label of action 2 ``` #### Survey Question Templates ``` - name name of question template (e.g., ab, repeat, compare) - question_header question instruction header text - question question template with placeholders ``` #### Survey Responses ``` - scenario_id unique scenario identifier - model_id model identifier (e.g., openai/gpt-4) - question_type question type (ab: A or B?, repeat: Repeat the preferred answer, compare: Do you prefer A over B? ) - question_ordering question ordering label (0: default order, 1: flipped order) - question_header question instruction header text - question_text question text - answer_raw raw answer of model - decision semantic answer of model (e.g., action1, action2, refusal, invalid) - eval_technique evaluation technique used - eval_top_p evaluation parameter - top_p - eval_temperature evaluation parameter - temperature - timestamp timestamp of model access ``` ## Dataset Creation ### Generation of Moral Scenarios The construction of *MoralChoice* follows a three-step procedure: - **Scenario Generation:** We generate seperately low- and high-ambiguity scenarios (i.e., the triple of scenario context, action 1 and action 2) guided by the 10 rules of Gert's common morality framework. - **Low-Ambiguity Scenarios:** Zero-Shot Prompting Setup based on OpenAI's gpt-4 - **High-Ambiguity Scenarios:** Stochastic Few-Shot Prompting Setup based on OpenAI's text-davinci-003 using a a set of 100 hand-written scenarios - **Scenario Curation:** We check the validity and grammar of each generated scenario manually and remove invalid scenarios. In addition, we assess lexical similarity between the generated scenarios and remove duplicates and overly-similar scenarios. - **Auxiliarly Label Aquisition:** We acquire auxiliary rule violation labels through SurgeAI for every scenario. For detailed information, we refer to the corresponding paper. ## Collection of LLM responses Across all models, we employ **temperature-based sampling** with `top-p=1.0`and `temperature=1.0`. For every specific question form (unique combination of scenario, question template, answer option ordering), we collect multiple samples (5 for low-ambiguity scenarios and 10 for high-ambiguity scenarios). The raw sequence of token outputs were mapped to semantic action (see the corresponding paper for exact details). ### Annotations To acquire high-quality annotations, we employ experienced annotators sourced through the data-labeling company [Surge AI](https://www.surgehq.ai/). ## Considerations for Using the Data - Limited Diversity in Scenarios (professions, contexts) - Limited Diversity in Question-Templates - Limited to English ### Dataset Curators - Nino Scherrer ([Website](https://ninodimontalcino.github.io/), [Mail](mailto:nino.scherrer@gmail.com?subject=[MoralChoice])) - Claudia Shi ([Website](https://www.claudiajshi.com/), [Mail](mailto:nino.scherrer@gmail.com?subject=[MoralChoice])) ### Citation ``` @misc{scherrer2023moralchoice, title={Evaluating the Moral Beliefs Encoded in LLMs}, author={Scherrer, Nino and Shi, Claudia, and Feder, Amir and Blei, David}, year={2023}, journal={arXiv:} } ```
提供机构:
ninoscherrer
原始信息汇总

数据集概述

数据集名称: MoralChoice

许可证: cc-by-4.0

语言: 仅限英语

数据集大小: 1K<n<10K

数据集内容

数据组成:

  • 调查问题元数据(道德场景): 1767个假设的道德场景,包括687个低模糊性场景和680个高模糊性场景。
  • 调查问题模板: 3个手工策划的问题模板。
  • 调查响应: 来自28个开放源和闭源LLM的输出。

数据字段:

  • 道德场景(调查问题元数据): 包括场景ID、模糊性、生成类型、上下文、两个潜在行动及其规则违反标签。
  • 调查问题模板: 包括模板名称、问题标题和问题模板。
  • 调查响应: 包括场景ID、模型ID、问题类型、问题顺序、问题标题、问题文本、原始答案、决策、评估技术、评估参数和时间戳。

数据集创建

道德场景生成:

  • 低模糊性场景: 使用基于OpenAI的gpt-4的零样本提示设置生成。
  • 高模糊性场景: 使用基于OpenAI的text-davinci-003的随机少样本提示设置生成。

场景精选:

  • 手动检查每个生成场景的有效性和语法,并移除无效场景。
  • 评估生成场景之间的词汇相似性,移除重复和过于相似的场景。

辅助标签获取:

  • 通过SurgeAI获取每个场景的辅助规则违反标签。

使用注意事项

  • 场景多样性有限: 职业和情境的多样性有限。
  • 问题模板多样性有限: 问题模板数量有限。
  • 语言限制: 仅限英语。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作