Gender Bias in Text: Labeled Datasets and Lexicons
收藏arXiv2023-02-23 更新2024-07-24 收录
下载链接:
https://github.com/jaddoughman/Gender-Bias-Datasets-Lexicons
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘Gender Bias in Text: Labeled Datasets and Lexicons’,由美国贝鲁特大学创建,旨在通过收集、标注和扩充相关句子,提供用于检测英语文本中性别偏见的标注数据集和详尽词汇表。数据集涵盖多种偏见子类型,如通用他、通用她、性别明确标记和性别化新词。通过使用词嵌入模型进一步增强收集的词汇表。数据集适用于通过监督和非监督机器学习及自然语言处理技术自动化检测性别偏见,旨在解决文本中性别偏见的检测和缓解问题。
This dataset, named 'Gender Bias in Text: Labeled Datasets and Lexicons', was created by the American University of Beirut. It aims to provide labeled datasets and comprehensive lexicons for detecting gender bias in English text through the collection, annotation, and expansion of relevant sentences. The dataset covers multiple subtypes of gender bias, including generic masculine pronouns, generic feminine pronouns, explicit gender markers, and gendered neologisms. The collected lexicons are further augmented via the application of word embedding models. This dataset is suitable for the automated detection of gender bias using supervised, unsupervised machine learning and natural language processing techniques, and it is intended to address the issues of detecting and mitigating gender bias in text.
提供机构:
美国贝鲁特大学
创建时间:
2022-01-21
原始信息汇总
Gender-Bias-Datasets-Lexicons
摘要
本数据集旨在提供用于自动化检测性别偏见的标注数据集和词典,通过收集、标注和扩充相关句子,以促进在英文文本中检测性别偏见。数据集涵盖多种偏见子类型,包括通用男性代词、通用女性代词、性别标记和性别新词等。通过使用词嵌入模型进一步扩充收集的词典,旨在帮助技术社区利用机器学习和自然语言处理技术对抗文本中的性别偏见。
性别偏见分类
| 类型 | 子类型 | 示例 | 影响 |
|---|---|---|---|
| 通用代词 | 通用男性代词 | The client should receive his invoice in two weeks. | 偏见心理意象 |
| 通用代词 | 通用女性代词 | A nurse should ensure that she gets adequate rest. | 偏见心理意象 |
| 通用代词 | 性别通用男性 | Good teachers know how to man the classroom. | 偏见心理意象 |
| 性别歧视 | 敌对性别歧视 | Women are incompetent at work. | 攻击性行为 |
| 性别歧视 | 仁慈性别歧视 | They’re probably surprised at how smart you are, for a girl. | 代表性伤害 |
| 职业偏见 | 性别劳动分工 | Professors are men and elementary teachers are women. | 劳动参与率 |
| 职业偏见 | 性别角色与职责 | I’ll have my girl get you a cup of coffee. | 劳动参与率 |
| 排他性偏见 | 性别标记 | Chairman, Businessman, Manpower, Cameraman... | 代表性伤害 |
| 排他性偏见 | 性别新词 | Man-bread, Man-sip... | 代表性伤害 |
| 排他性偏见 | 性别词序 | “Men and Women", “Brothers and Sisters"... | 代表性伤害 |
| 语义 | 隐喻 | “Cookie": lovely woman. | 偏见传播 |
| 语义 | 性别属性 | An unmarried male (bachelor) is a “personal choice”. An unmarried female (spinster) is derogatorily an “old maid". | 偏见传播 |
| 语义 | 老话 | A woman’s tongue three inches long can kill a man six feet high. | 偏见传播 |
引用
@article{doughman2022gender, title={Gender Bias in Text: Labeled Datasets and Lexicons}, author={Doughman, Jad and Khreich, Wael}, journal={arXiv preprint arXiv:2201.08675}, year={2022} }
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集提供标注数据集和词典,用于检测英文文本中的性别偏见,支持机器学习和自然语言处理应用。它涵盖多种偏见类型,如通用代词、性别歧视和职业偏见,并包括分类和示例,旨在促进社会包容和性别平等。
以上内容由遇见数据集搜集并总结生成



