five

Datasets for Fair Name-Based Gender Prediction in Scientific Communities

收藏
Figshare2025-08-14 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_for_Fair_Name-Based_Gender_Prediction_in_Scientific_Communities/29909603
下载链接
链接失效反馈
官方服务:
资源简介:
The datasets support the evaluation of fair name-based gender prediction software across two scientific domains: energy transition and critical infrastructures. Each dataset contains public information on scientific authors and their gender, determined through manual validation and compared against predictions from multiple automated tools. The gender labels in these datasets represent the assessment of human annotators based solely on the information available (e.g., names) and do not necessarily reflect the self-identified gender or gender perception of the authors.The energy transition dataset is derived from papers retrieved from Scopus using the query terms “energy transition” OR “energy transformation.” The initial set of 17,591 papers was refined to 10,130 using the Energy Systems Ontology (ESO) (De Nicola et al., 2024), authored by 27,363 individuals. From this population, 1,000 authors were randomly selected for manual gender validation, resulting in 260 females, 575 males, and 165 of undetermined gender.The critical infrastructures dataset is based on all 380 papers published between 2006 and 2022 in the proceedings of the International Conference on Critical Information Infrastructures Security (CRITIS), involving 929 authors. All authors were manually validated, yielding 153 females, 768 males, and 8 of undetermined gender.The datasets are provided in JSON format, one file per domain:- ET-report.json contains records for the 1,000 manually validated authors in the energy transition dataset. Each record includes the author’s full name, the Semantic Scholar ID, the manual validation gender label, and the predictions from multiple automated gender prediction tools (Prediction Manager, Gender API, ChatGPT, and NamSor).- CRITIS-report.json contains records for all 929 manually validated authors in the critical infrastructures dataset, with the same structure and fields as the energy transition file, except without the Semantic Scholar ID.These structured files enable reproducible analysis, cross-tool performance comparisons, and integration into further research workflows.ReferenceDe Nicola, A., Patriarca, T., Fresilli, B., Opromolla, A., Guariglia Migliore, M., Leonardi, N., D’Agostino, G., Cellini, M., Mirenda, C., Tagliacozzo, S., Pisacane, L., Vassillo, C. (2024) D.1.2 - Report on gendered assessment of the energy systems knowledge community and EU policies for sustainable energy systems—Horizon Europe Project gEneSys—Transforming gendered interrelations of power and inequalities in transition pathways to sustainable energy systems, grant agreement no. 101094326. https://ec.europa.eu/research/participants/documents/downloadPublic?documentIds=080166e509765b4f&appId=PPGMS
创建时间:
2025-08-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作