BiasBios (Bias in Bios)
收藏OpenDataLab2026-05-31 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/BiasBios
下载链接
链接失效反馈官方服务:
资源简介:
该数据集的目的是研究职业中的性别偏见。收集了用英语编写的在线传记,以查找姓名、代词和职业。根据外观确定了 28 个最常见的职业。生成的数据集包含 397,340 份传记,涵盖 28 个不同的职业。在这些职业中,教授是最常见的,有 118,400 部传记,而说唱歌手是最少的,有 1,406 部传记。传记重要信息: 1、传记最长194个,最短18个;传记长度的中位数是七十二个记号。 2. 需要注意的是,在线传记主题的人口统计数据不同于整体劳动力的人口统计数据,并且该数据集不包含互联网上的所有传记。
This dataset is designed to investigate gender bias in occupations. Online biographies written in English were collected to extract names, pronouns, and occupations. Twenty-eight most common occupations were identified based on their surface occurrences in the collected texts. The resulting dataset contains 397,340 biographies spanning 28 distinct occupations. Among these occupations, professors are the most prevalent, with 118,400 biographies, while rappers are the least common, with only 1,406 biographies. Key information about the biographies: 1. The maximum length of a single biography is 194 tokens, while the minimum is 18 tokens; the median length of all biographies is 72 tokens. 2. It should be noted that the demographic data of the subjects in the online biographies differs from that of the overall workforce, and this dataset does not include all biographies available on the internet.
提供机构:
OpenDataLab
创建时间:
2022-08-19
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集旨在分析职业领域的性别偏见,通过收集英语在线传记并提取姓名、代词和职业信息。它包含397,340份传记,涵盖28个职业,其中教授数量最多,说唱歌手最少,但需注意在线传记的人口统计数据与整体劳动力存在差异。
以上内容由遇见数据集搜集并总结生成



