MD Gender (Multi-Dimensional Gender Bias Datasets)

Name: MD Gender (Multi-Dimensional Gender Bias Datasets)
Creator: OpenDataLab
Published: 2026-05-24 11:30:31
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/MD_Gender

下载链接

链接失效反馈

官方服务：

资源简介：

机器学习模型经过训练以发现数据中的模式。 NLP 模型在对有性别偏见的文本进行训练时，可能会无意中学习到社会不受欢迎的模式。在这项工作中，我们提出了一个通用框架，该框架将文本中的性别偏见分解为几个语用和语义维度：来自被谈论者的性别的偏见、来自被谈论的人的性别的偏见以及来自被交谈者的性别的偏见。演讲者。使用这个细粒度的框架，我们使用性别信息自动注释八个大型数据集。此外，我们收集了一个新颖的、众包的话语级性别重写评估基准。从多个维度区分性别偏见很重要，因为它使我们能够训练更细粒度的性别偏见分类器。我们展示了我们的分类器在各种重要应用中被证明是有价值的，例如控制生成模型中的性别偏见，检测任意文本中的性别偏见，以及在性别方面阐明攻击性语言。

Machine learning models are trained to discover patterns in data. When trained on gender-biased text, NLP models may inadvertently learn socially undesirable patterns. In this work, we propose a general framework that decomposes gender bias in text into several pragmatic and semantic dimensions: bias related to the gender of the entities being talked about, bias related to the gender of the individuals being discussed, and bias related to the gender of the addressees, i.e., the interlocutors of the speaker. Using this fine-grained framework, we automatically annotate eight large-scale datasets with gender information. Additionally, we collect a novel, crowdsourced utterance-level gender rewriting evaluation benchmark. Distinguishing gender bias across multiple dimensions is critical, as it enables us to train more fine-grained gender bias classifiers. We demonstrate that our classifiers prove valuable in various important applications, such as controlling gender bias in generative models, detecting gender bias in arbitrary text, and clarifying offensive language from the perspective of gender.

提供机构：

OpenDataLab

创建时间：

2022-08-19

搜集汇总

数据集介绍