Brazilian Names Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://brasil.io/dataset/genero-nomes/grupos/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了100,787个巴西名字,其中性别信息显示女性名字占54.82%,男性名字占45.18%。数据集分为两个子集:一个子集包含性别比例达到100%的名字(共90,158个),另一个子集则包含所有名字。此外,该数据集还包括了名字的性别信息、名字频率、群体频率、群体名称以及双性别名字的性别比例。规模上,数据集共有100,787个名字。任务是基于名字首字母进行性别预测。
This dataset contains 100,787 Brazilian given names, with gender distribution showing that 54.82% are female names and 45.18% are male names. It is divided into two subsets: one subset includes names with a 100% gender concentration (either entirely female or entirely male), totaling 90,158 entries, while the other encompasses all names in the dataset. Furthermore, the dataset provides comprehensive attributes for each name, including gender information, name frequency, group frequency, group name, and gender ratio for gender-ambiguous names. In terms of overall scale, the dataset comprises a total of 100,787 names. The downstream task associated with this dataset is gender prediction based on the initial letters of given names.
提供机构:
Brasil.io



