five

Trends in gender homophily in scientific publications (data)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7958033
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas. The main dataset (author_address_article_gend_v3.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row): ut: unique article identifier. daisng_id: unique author identifier. author_no: author number, as listed in the article. country: author country (two-letter ISO code). date: publication date. gender: gender of the author ("male" or "female"), as provided by the Genderize.io API. probability: probability of the gender attribute, as provided by the Genderize.io API. count: number of entries for the author first name, as provided by the Genderize.io API. jsc: journal subject category. field: field of research. research_area: area of research. n_aut: number of authors in this publication. journal: journal name. alphabetical: whether the author list for this article is in alphabetical order. With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format: null_field.rds: null homophily values per country, year and field of research. null_field_comp.rds: null homophily values per year and field of research (only for complete authorships). null_research.rds: null homophily values per year and area of research. null_research_comp.rds: null homophily values per year and area of research (only for complete authorships). All these datasets have the same structure: country: country (two-letter ISO code). year: year. variable: either field or research area name. m: average homophily. s: homophily std. error. Finally, some supplementary files used in the descriptive analysis and methods: File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019. File wos_category_to_field.csv is a mapping from WoS categories to more general fields. File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.
创建时间:
2024-04-12
二维码
社区交流群
二维码
科研交流群
商业服务