peixian/equity_evaluation_corpus

Name: peixian/equity_evaluation_corpus
Creator: peixian
Published: 2022-10-20 23:35:15
License: 暂无描述

Hugging Face2022-10-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/peixian/equity_evaluation_corpus

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - expert-generated language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-classification task_ids: [] tags: - gender-classification --- # Dataset Card for equity-evaluation-corpus ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** [Needs More Information] - **Repository:** [Needs More Information] - **Paper:** [Needs More Information] - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Needs More Information] ### Dataset Summary Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems and resources. Further, there is a lack of benchmark datasets for examining inappropriate biases in system predictions. Here, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We used the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 Affect in Tweets. We found that several of the systems showed statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available, and encourage its use to evaluate biases in sentiment and other NLP tasks. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages [Needs More Information] ## Dataset Structure ### Data Instances [Needs More Information] ### Data Fields - `sentence`: a `string` feature. - `template`: a `string` feature. - `person`: a `string` feature. - `race`: a `string` feature. - `emotion`: a `string` feature. - `emotion word`: a `string` feature. ### Data Splits [Needs More Information] ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information [Needs More Information] ### Citation Information [Needs More Information]

提供机构：

peixian

原始信息汇总

数据集概述

数据集描述

数据集总结

名称: Equity Evaluation Corpus (EEC)
目的: 用于评估机器学习系统中的偏见，特别是针对种族和性别的偏见。
内容: 包含8,640个英语句子，用于测试219个自动情感分析系统在SemEval-2018 Task 1中的表现。
发现: 多个系统显示出对特定种族或性别的统计显著性偏见。

支持的任务

任务: 文本分类
具体任务: 性别分类

语言

语言: 英语

数据集结构

数据实例

数量: 8,640个句子

数据字段

sentence: 字符串类型
template: 字符串类型
person: 字符串类型
race: 字符串类型
emotion: 字符串类型
emotion word: 字符串类型

数据集创建

来源数据

类型: 原始数据

注释

创建者: 专家生成

使用数据的考虑

数据集的社会影响

目的: 评估和揭示机器学习系统中的偏见

数据集的偏见讨论

发现: 系统对特定种族或性别显示出偏见

附加信息

许可证信息

类型: 未知

5,000+

优质数据集

54 个

任务类型

进入经典数据集