Marlon154/moral-number-corpus

Name: Marlon154/moral-number-corpus
Creator: Marlon154
Published: 2024-05-16 11:13:22
License: 暂无描述

Hugging Face2024-05-16 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/Marlon154/moral-number-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-sa-4.0 language: - en size_categories: - 1K<n<10K configs: - config_name: annotationed_questions data_files: "annotated_questions.csv" default: true - config_name: all_questions data_files: "all_questions.csv" --- # A Perspectivist Corpus of Numbers in Social Judgements We constructed a corpus of moral and social judgements (questions are derived from the [Commonsense Norm Bank](https://arxiv.org/abs/2110.07574)) that asks people to fill in number ranges that do not change a given judgement. Our corpus was crowdsourced from 30 annotators and contains 898 statements for a total of 3k annotations. This work adds to available moral and social judgement data by providing ranges of (un)acceptable behaviors and annotator demographics. This work supports perspectivist and pluralistic approaches with a goal of creating models that can understand and express multiple points of view, whose point of view it is, and uncertainty about definitive answers. The number to replace is randomly choosen from all numbers in a given question. ## Structure of ``annotated_questions.csv`` ### Core Data - id: A unique identifier for each data entry. It has the following structure: - Prefix: - "ff" indicates a freeform question and - "yn" indicates a yes-no question from the Commonsense Norm Bank. - Separator: "x" - Subset: - "tr" for the training set. - "te" for the test set. - "va" for the validation set. - Separator: "x" - Subset ID: A unique numerical ID assigned within the specified subset. - number_to_replace: The original number within the statement that participants are asked to replace. - numeric_num: The original numeric value present in the statement (represented as a list in case of multiple numbers). - form: The format of the question ('freeform' or 'yes_no'). - set_type: Specifies if the data point is part of the training, validation, or testing set. - statement: The moral statement presented to participants, with ``<<NUM>>`` marking the number to be replaced. - class_label: Numerical rating indicating the moral judgment (-1 negative, 0 neutral, and 1 positive). - text_label: Textual version of the moral judgment (e.g., "It's understandable"). ### Replacement Information - list_span_start: A list of possible starting indices in the statement where the number span to be replaced could begin. - list_span_end: A list of possible ending indices in the statement where the number span to be replaced could end. - to_inf: A boolean (True/False) indicating whether the word "inf" (infinity) is a valid replacement option. - not_modifiable: A boolean (True/False) indicating whether the number is meant to remain unchanged. ### IAA - agreement: An agreement score ( Jaccard index: between 0.0 and 1.0) indicating consistency between different annotators who judged the same statement. ## Structure of ``annotations.json`` The annotations.json file contains a list of objects, each representing an annotator and their associated surveys. Here's a detailed breakdown of the structure: - id (string): A unique identifier for the annotator. - age (string): The age of the annotator. - nation (string): The nation the annotator is from. - religion (string): The religion of the annotator. - education (string): The education level of the annotator. - political (string): The political leaning of the annotator. - gender (string): The gender of the annotator. - surveys (array): A list of surveys completed by the annotator. Each survey is an object with the following fields: - sid (string): A unique identifier for the survey. - time (integer): The time taken to complete the survey. - out_counter (float): A field related to the survey (exact meaning not provided). - inf_counter (float): Another field related to the survey (exact meaning not provided). - answers (object): An object where each key is a question identifier and the value is another object with the following fields: - start (string): The start time for answering the question. - end (string): The end time for answering the question. ## Structure of ``all_questions.csv`` The ``all_questions.csv`` file contains a list of questions that were extracted from the [Commonsense Norm Bank](https://arxiv.org/abs/2110.07574). Each row in the CSV file represents a single question and has the following columns: - id: A unique identifier for each data entry. It has the following structure: - Prefix: - "ff" indicates a freeform question and - "yn" indicates a yes-no question from the Commonsense Norm Bank. - Separator: "x" - Subset: - "tr" for the training set. - "te" for the test set. - "va" for the validation set. - Separator: "x" - Subset ID: A unique numerical ID assigned within the specified subset. - number_to_replace: The original number within the statement that participants are asked to replace. - numeric_num: The original numeric value present in the statement (represented as a list in case of multiple numbers). - form: The format of the question ('freeform' or 'yes_no'). - set_type: Specifies if the data point is part of the training, validation, or testing set. - statement: The moral statement presented to participants, with ``<<NUM>>`` marking the number to be replaced. - class_label: Numerical rating indicating the moral judgment (-1 negative, 0 neutral, and 1 positive). - text_label: Textual version of the moral judgment (e.g., "It's understandable").

提供机构：

Marlon154

原始信息汇总

数据集概述

数据集名称

A Perspectivist Corpus of Numbers in Social Judgements

数据集描述

该数据集包含从Commonsense Norm Bank衍生的道德和社会判断问题，要求参与者填写不改变给定判断的数字范围。数据集由30名注释者共同完成，包含898条陈述，总计3000条注释。此工作通过提供可接受行为范围和注释者人口统计信息，增加了现有的道德和社会判断数据。该数据集支持视角主义和多元主义方法，旨在创建能够理解和表达多重视角、确定视角归属及不确定性的模型。

数据集结构

核心文件

annotated_questions.csv
- id: 唯一标识符，结构为前缀（"ff"或"yn"）+ 分隔符（"x"）+ 子集（"tr", "te", "va"）+ 分隔符（"x"）+ 子集ID。
- number_to_replace: 原始陈述中需要替换的数字。
- numeric_num: 原始陈述中的数字值（可能为列表）。
- form: 问题格式（freeform或yes_no）。
- set_type: 数据点所属集合类型（训练、验证或测试）。
- statement: 道德陈述，其中<<NUM>>标记需替换的数字。
- class_label: 道德判断的数值评级（-1, 0, 1）。
- text_label: 道德判断的文本描述。
- list_span_start: 可能的数字替换起始位置。
- list_span_end: 可能的数字替换结束位置。
- to_inf: 是否允许替换为"inf"（无限）。
- not_modifiable: 数字是否不可更改。
- agreement: 注释者间一致性评分（Jaccard指数）。
annotations.json
- 包含注释者及其完成调查的详细信息，包括年龄、国籍、宗教、教育、政治倾向、性别等。
all_questions.csv
- 包含从Commonsense Norm Bank提取的所有问题，结构与annotated_questions.csv类似。

数据集配置

config_name: annotationed_questions
- data_files: annotated_questions.csv
- default: true
config_name: all_questions
- data_files: all_questions.csv

许可证

cc-by-sa-4.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集