Trends in gender homophily in scientific publications (data)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7958033
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas.
The main dataset (author_address_article_gend_v3.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row):
ut: unique article identifier.
daisng_id: unique author identifier.
author_no: author number, as listed in the article.
country: author country (two-letter ISO code).
date: publication date.
gender: gender of the author ("male" or "female"), as provided by the Genderize.io API.
probability: probability of the gender attribute, as provided by the Genderize.io API.
count: number of entries for the author first name, as provided by the Genderize.io API.
jsc: journal subject category.
field: field of research.
research_area: area of research.
n_aut: number of authors in this publication.
journal: journal name.
alphabetical: whether the author list for this article is in alphabetical order.
With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format:
null_field.rds: null homophily values per country, year and field of research.
null_field_comp.rds: null homophily values per year and field of research (only for complete authorships).
null_research.rds: null homophily values per year and area of research.
null_research_comp.rds: null homophily values per year and area of research (only for complete authorships).
All these datasets have the same structure:
country: country (two-letter ISO code).
year: year.
variable: either field or research area name.
m: average homophily.
s: homophily std. error.
Finally, some supplementary files used in the descriptive analysis and methods:
File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019.
File wos_category_to_field.csv is a mapping from WoS categories to more general fields.
File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.
创建时间:
2024-04-12



