ninoscherrer/moralchoice

Name: ninoscherrer/moralchoice
Creator: ninoscherrer
Published: 2024-02-03 14:30:22
License: 暂无描述

Hugging Face2024-02-03 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ninoscherrer/moralchoice

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: MoralChoice license: cc-by-4.0 language: - en size_categories: - 1K<n<10K --- # Dataset Card for MoralChoice - **Homepage:** Coming Soon - **Paper:** Coming soon - **Repository:** [https://github.com/ninodimontalcino/moralchoice](https://github.com/ninodimontalcino/moralchoice) - **Point of Contact:** [Nino Scherrer & Claudia Shi](mailto:nino.scherrer@gmail.com,claudia.j.shi@gmail.com?subject=[MoralChoice]) ### Dataset Summary *MoralChoice* is a survey dataset to evaluate the moral beliefs encoded in LLMs. The dataset consists of: - **Survey Question Meta-Data:** 1767 hypothetical moral scenarios where each scenario consists of a description / context and two potential actions - **Low-Ambiguity Moral Scenarios (687 scenarios):** One action is clearly preferred over the other. - **High-Ambiguity Moral Scenarios (680 scenarios):** Neither action is clearly preferred - **Survey Question Templates:** 3 hand-curated question templates - **Survey Responses:** Outputs from 28 open- and closed-sourced LLMs A statistical workflow for analyzing the survey responses can be found in the corresponding [paper](). 🚧 **Important**: 🚧 - *Moral scenarios* and *question templates* are already available. - *Survey responses* will be uploaded shortly! ### Languages *MoralChoice* is only available in English. ## Dataset Structure ### Data Fields #### Moral Scenarios (Survey Question Meta-Data) ``` - scenario_id unique scenario identifier - ambiguity level of ambiguity (low or high) - generation_type generation type (hand-written or generated) - context scenario description / contextualization - action 1 description of a potential action - action 2 description of a potential action - a1_{rule} {rule} violation label of action 1 - a2_{rule} {rule} violation label of action 2 ``` #### Survey Question Templates ``` - name name of question template (e.g., ab, repeat, compare) - question_header question instruction header text - question question template with placeholders ``` #### Survey Responses ``` - scenario_id unique scenario identifier - model_id model identifier (e.g., openai/gpt-4) - question_type question type (ab: A or B?, repeat: Repeat the preferred answer, compare: Do you prefer A over B? ) - question_ordering question ordering label (0: default order, 1: flipped order) - question_header question instruction header text - question_text question text - answer_raw raw answer of model - decision semantic answer of model (e.g., action1, action2, refusal, invalid) - eval_technique evaluation technique used - eval_top_p evaluation parameter - top_p - eval_temperature evaluation parameter - temperature - timestamp timestamp of model access ``` ## Dataset Creation ### Generation of Moral Scenarios The construction of *MoralChoice* follows a three-step procedure: - **Scenario Generation:** We generate seperately low- and high-ambiguity scenarios (i.e., the triple of scenario context, action 1 and action 2) guided by the 10 rules of Gert's common morality framework. - **Low-Ambiguity Scenarios:** Zero-Shot Prompting Setup based on OpenAI's gpt-4 - **High-Ambiguity Scenarios:** Stochastic Few-Shot Prompting Setup based on OpenAI's text-davinci-003 using a a set of 100 hand-written scenarios - **Scenario Curation:** We check the validity and grammar of each generated scenario manually and remove invalid scenarios. In addition, we assess lexical similarity between the generated scenarios and remove duplicates and overly-similar scenarios. - **Auxiliarly Label Aquisition:** We acquire auxiliary rule violation labels through SurgeAI for every scenario. For detailed information, we refer to the corresponding paper. ## Collection of LLM responses Across all models, we employ **temperature-based sampling** with `top-p=1.0`and `temperature=1.0`. For every specific question form (unique combination of scenario, question template, answer option ordering), we collect multiple samples (5 for low-ambiguity scenarios and 10 for high-ambiguity scenarios). The raw sequence of token outputs were mapped to semantic action (see the corresponding paper for exact details). ### Annotations To acquire high-quality annotations, we employ experienced annotators sourced through the data-labeling company [Surge AI](https://www.surgehq.ai/). ## Considerations for Using the Data - Limited Diversity in Scenarios (professions, contexts) - Limited Diversity in Question-Templates - Limited to English ### Dataset Curators - Nino Scherrer ([Website](https://ninodimontalcino.github.io/), [Mail](mailto:nino.scherrer@gmail.com?subject=[MoralChoice])) - Claudia Shi ([Website](https://www.claudiajshi.com/), [Mail](mailto:nino.scherrer@gmail.com?subject=[MoralChoice])) ### Citation ``` @misc{scherrer2023moralchoice, title={Evaluating the Moral Beliefs Encoded in LLMs}, author={Scherrer, Nino and Shi, Claudia, and Feder, Amir and Blei, David}, year={2023}, journal={arXiv:} } ```

提供机构：

ninoscherrer

原始信息汇总

数据集概述

数据集名称： MoralChoice

许可证： cc-by-4.0

语言： 仅限英语

数据集大小： 1K<n<10K

数据集内容

数据组成：

调查问题元数据（道德场景）： 1767个假设的道德场景，包括687个低模糊性场景和680个高模糊性场景。
调查问题模板： 3个手工策划的问题模板。
调查响应： 来自28个开放源和闭源LLM的输出。

数据字段：

道德场景（调查问题元数据）： 包括场景ID、模糊性、生成类型、上下文、两个潜在行动及其规则违反标签。
调查问题模板： 包括模板名称、问题标题和问题模板。
调查响应： 包括场景ID、模型ID、问题类型、问题顺序、问题标题、问题文本、原始答案、决策、评估技术、评估参数和时间戳。

数据集创建

道德场景生成：

低模糊性场景： 使用基于OpenAI的gpt-4的零样本提示设置生成。
高模糊性场景： 使用基于OpenAI的text-davinci-003的随机少样本提示设置生成。

场景精选：

手动检查每个生成场景的有效性和语法，并移除无效场景。
评估生成场景之间的词汇相似性，移除重复和过于相似的场景。

辅助标签获取：

通过SurgeAI获取每个场景的辅助规则违反标签。

使用注意事项

场景多样性有限： 职业和情境的多样性有限。
问题模板多样性有限： 问题模板数量有限。
语言限制： 仅限英语。

5,000+

优质数据集

54 个

任务类型

进入经典数据集