five

Supporting material for "Impact of gender on the formation and outcome of formal mentoring relationships in the life sciences"

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4722020
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains data and analysis code associated with the manuscript: L.P. Schwartz, J. Liénard, S. V. David. (2022) "Impact of gender on formation and outcome of formal mentoring relationships in the life sciences." Figures and tables in the manuscript can be produced by running the make_figures.ipynb notebook. Figures have been marked with headings indicating their position in the manuscript (Figure 1, Figure S1, etc.). In addition, the notebook contains code to reproduce regression analyses that are cited in the text but not directly associated with a figure. Data on mentoring relationships derives from Academic Family Tree (AFT, www.academictree.org) and public data sources on funding, publications, and awards. Inclusion criteria, public data sources, and procedures for linking across sources are described in the manuscript.  Personal identifiers for researchers have been anonymized, but remain consistent across all data in the repository. In other words, the personal identifier "1" refers to the same person in all dataframes in the repository. But, that person is *not* the same researcher identified as "1" on the public AFT website. Installation Requires Python 3.x. and Pandas. To load required libraries using Anaconda, run: `conda create --name aft -c conda-forge pandas numpy scipy ipython jupyterlab scipy scikit-learn pandas matplotlib numpy statsmodels seaborn pytables` Dataframes Data is stored as a series of Pandas dataframes within HDF5 or CSV files: * cng_tc: The primary dataset used in the analysis. The name is an acronym for "connections" (i.e. training relationships, "cn"), "gender" ("g"), and "trainee count" ("tc"). Each row contains data on the mentor and trainee in one training relationship. See manuscript for inclusion criteria. * mentors: Data on mentors. Each row contains data on one mentor. See manunscript for inclusion criteria. * mentors_grants, mentors_hindex, mentors_locs_ranked: Subset of mentors with data available for funding (mentors_grants), citation (mentors_hindex), and institution rank (mentors_locs_ranked). * mentors_nobel, mentors_hhmi, mentors_nas: Subsets of mentors that received a Nobel (mentors_nobel), Howard Hughes Medical Institute grants (mentors_hhmi), or membership in the National Academy of Sciences (mentors_nas). See manuscript for details of data sources and linking procedures. * cn, cng, first_names, gn, gn_all, locs: Partial data (connections only, inferred gender only, connections and gender only, location only, first names and inferred gender only) for more inclusive sets of researchers in AFT. They are generally not used used for analysis, but have been included here to calculate statistics on the total amount of data included and to screen for data from U.S. locations. * nsf_gender_phds, nsf_gender_pds: National Science Foundation survey data on gender and fraction PhDs conferred per year (nsf_gender_phds) or fraction postdocs employed per year (nsf_gender_pds). See manuscript for details of data source. * photo: Data for validation of gender inference method. Dataframe columns * amount: Mentor's total funding * amount_adj: Mentor's total funding (adjusted to 2020 dollars) * broad_field: Mentor's general research area (e.g., life sciences, engineering, based on National Science Foundation classifications) * continue: Whether trainee went on to become a mentor (i.e., has trainees listed in AFT) * country: Country in which mentor's current institution is located * firstname: First name of researcher (table of first names is not aligned with tables containing anonymized personal identifiers) * first_grant_year: Year of mentor's first grant * funding_rate: Mentor's annual funding rate (since first grant) * funding_rate_adj: Mentor's annual funding rate (since first grant) adjusted to 2020 dollars * hhmi: Whether mentor was granted HHMI funding * hindex: Mentor's hindex * location: Name of mentor's current institution * locid: Identifier for mentor's institution * locid_rank: Postion of mentor's institution in 2015 Quacquarelli-Symonds rankings (lower numbers are better) * locid_rank_rev: Reversed version of "locid_rank" (i.e., higher numbers are better) * majorarea: Mentor's specific research area (e.g, neuroscience) * male_mentor, male trainee: Whether the probability that a researcher's first name is used by a person identifying as a man meets threshold (see manuscript for details on gender inference using first names) * match_score: Score for string match between institution or name of awardee and researcher * mentor_career_start: The date at which the mentor's academic career began * mentor_continue_rate: Fraction of mentor's trainees that become mentors * mentor_continue_rate_ft: Fraction of mentor's woman trainees that become mentors * mentor_continue_rate_mt: Fraction of mentor's man trainees that become mentors * mentor_t_p_male0: Fraction of mentor's trainees that are men * mentor_t_p_male0_gs: Fraction of mentor's trainees that are men (graduate students only) * mentor_t_p_male0_pd: Fraction of mentor's trainees that are men (postdocs only) * mentor_tcount0: Mentor's total number of trainees * nas: Whether mentor is a member of the National Academy of Sciences * nobel: Whether mentor is a Nobel laureate * p_male_mentor, p_male_trainee: Probability that a researcher's first name is used by a person identifying as a man * pid: Anonymized identifier of researcher * pid_mentor: Anonymized identifier of mentor in training relationship * pid_trainee: Anonymized identifier of trainee in training relationship * pq: "1" if data on training relationship is drawn from ProQuest database and has not been manually edited a human AFT user * relation: Type of training relationship (1: graduate student, 2: postdoc) * scorer1, scorer2, scorer3: Results of photo validation of gender inference for each scorer * start: Training start year * stop: Training end year * trainee_tcount: Total people that the trainee has trained * triad: Whether trainee has participated in both a graduate-level and postdoctoral training relationship The cn dataframe follows slightly different naming conventions, but is not generally used in the analysis (pid1 = pid_trainee, pid2 = pid_mentor, startdate = start, stopdate = stop).
创建时间:
2022-07-25
二维码
社区交流群
二维码
科研交流群
商业服务