five

Data from the paper "The landscape of biomedical research"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7695389
下载链接
链接失效反馈
官方服务:
资源简介:
Data from the paper "The landscape of biomedical research". The paper used the PubMed 2020 baseline (download date: 26.01.2021, not available anymore) supplemented with additional files from the 2021 baseline (download date: 27.04.2022, not available anymore), both originally obtained from https://www.nlm.nih.gov/databases/download/pubmed_medline.html, courtesy of the U.S. National Library of Medicine. This data can be found in v2 of this repository (https://zenodo.org/records/7849020). In the latest version of this repository we provide the PubMed 2024 baseline (download date: 06.02.2024) including all papers until the end of 2023, which is not the main data we analyzed in the paper but an updated version including newer articles. The paper contains two supplementary figures (S9 and S10) with the updated embedding. The latest version provided here includes the following files: pubmed_landscape_data_2024_v2.zip, which includes: - from the PubMed database: article title, journal, PMID, and publication year. - produced by us: t-SNE embedding X and Y coordinates, label, color, whether the paper is retracted or not (combining PubMed and Retraction Watch information), affiliation country ( from the first affiliation of the first author), and inferred gender (of both first and last author). (Note: pubmed_landscape_data_2024_v2.zip is identical to pubmed_landscape_data_2024.zip from v3 of this repository, but includes inferred genders additionally.)   pubmed_landscape_abstracts_2024.zip, which includes: - from the PubMed database: PMID, and paper abstracts.   PubMedBERT_embeddings_float16_2024.npy, which includes: - produced by us: PubMedBERT embeddings of the paper abstracts (numpy.ndarray of shape 23,389,083x768).
创建时间:
2024-07-22
二维码
社区交流群
二维码
科研交流群
商业服务