peixian/equity_evaluation_corpus
收藏Hugging Face2022-10-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/peixian/equity_evaluation_corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language_creators:
- expert-generated
language:
- en
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids: []
tags:
- gender-classification
---
# Dataset Card for equity-evaluation-corpus
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-instances)
- [Data Splits](#data-instances)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
## Dataset Description
- **Homepage:** [Needs More Information]
- **Repository:** [Needs More Information]
- **Paper:** [Needs More Information]
- **Leaderboard:** [Needs More Information]
- **Point of Contact:** [Needs More Information]
### Dataset Summary
Automatic machine learning systems can inadvertently accentuate and perpetuate inappropriate human biases. Past work on examining inappropriate biases has largely focused on just individual systems and resources. Further, there is a lack of benchmark datasets for examining inappropriate biases in system predictions. Here, we present the Equity Evaluation Corpus (EEC), which consists of 8,640 English sentences carefully chosen to tease out biases towards certain races and genders. We used the dataset to examine 219 automatic sentiment analysis systems that took part in a recent shared task, SemEval-2018 Task 1 Affect in Tweets. We found that several of the systems showed statistically significant bias; that is, they consistently provide slightly higher sentiment intensity predictions for one race or one gender. We make the EEC freely available, and encourage its use to evaluate biases in sentiment and other NLP tasks.
### Supported Tasks and Leaderboards
[Needs More Information]
### Languages
[Needs More Information]
## Dataset Structure
### Data Instances
[Needs More Information]
### Data Fields
- `sentence`: a `string` feature.
- `template`: a `string` feature.
- `person`: a `string` feature.
- `race`: a `string` feature.
- `emotion`: a `string` feature.
- `emotion word`: a `string` feature.
### Data Splits
[Needs More Information]
## Dataset Creation
### Curation Rationale
[Needs More Information]
### Source Data
#### Initial Data Collection and Normalization
[Needs More Information]
#### Who are the source language producers?
[Needs More Information]
### Annotations
#### Annotation process
[Needs More Information]
#### Who are the annotators?
[Needs More Information]
### Personal and Sensitive Information
[Needs More Information]
## Considerations for Using the Data
### Social Impact of Dataset
[Needs More Information]
### Discussion of Biases
[Needs More Information]
### Other Known Limitations
[Needs More Information]
## Additional Information
### Dataset Curators
[Needs More Information]
### Licensing Information
[Needs More Information]
### Citation Information
[Needs More Information]
提供机构:
peixian
原始信息汇总
数据集概述
数据集描述
数据集总结
- 名称: Equity Evaluation Corpus (EEC)
- 目的: 用于评估机器学习系统中的偏见,特别是针对种族和性别的偏见。
- 内容: 包含8,640个英语句子,用于测试219个自动情感分析系统在SemEval-2018 Task 1中的表现。
- 发现: 多个系统显示出对特定种族或性别的统计显著性偏见。
支持的任务
- 任务: 文本分类
- 具体任务: 性别分类
语言
- 语言: 英语
数据集结构
数据实例
- 数量: 8,640个句子
数据字段
sentence: 字符串类型template: 字符串类型person: 字符串类型race: 字符串类型emotion: 字符串类型emotion word: 字符串类型
数据集创建
来源数据
- 类型: 原始数据
注释
- 创建者: 专家生成
使用数据的考虑
数据集的社会影响
- 目的: 评估和揭示机器学习系统中的偏见
数据集的偏见讨论
- 发现: 系统对特定种族或性别显示出偏见
附加信息
许可证信息
- 类型: 未知



