five

Open dataset of scholars on Twitter (X)

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7013517
下载链接
链接失效反馈
官方服务:
资源简介:
This is a version 2 dataset of paired OpenAlex author IDs (https://docs.openalex.org/about-the-data/author) and Twitter (now X) user IDs Major update in this version Following the significant update to OpenAlex's author identification system, the scholars on Twitter dataset, which previously linked Twitter IDs to OpenAlex author IDs, immediately became outdated. This called for a new approach to re-establish these links, as the absence of new Twitter data made it impossible to replicate the original method of matching Twitter profiles with scholarly authors. To navigate this challenge, a bridge was constructed between the June 2022 snapshot of the OpenAlex database—used in the original matching process—and the most recent snapshot from February 2024. This bridge utilized OpenAlex works IDs and DOIs to match authors in both datasets by their shared publications and identical primary names. When a connection was established between two authors with the same name, the new OpenAlex author ID was assigned to the corresponding Twitter ID. When direct matches based on primary names were not found, an attempt was made to establish connections by matching the names from June 2022 with any corresponding alternative names found in the 2024 dataset. This method ensured continuity of identity through the system update, adapting the strategy to link profiles across the temporal divide created by the database's overhaul. Our efficient method for re-establishing links between author IDs and Twitter profiles has been notably successful, managing to rematch 432,417 (88%) OpenAlex author IDs. This effort successfully restored connections for 388,968 unique Twitter users, which represents 92% of the original dataset. Of these, 375,316 were matched using their primary names, and 57,101 through alternative names. The simplicity and quick execution of this approach led to exceptionally favourable results, with a minimal loss of only 8% of the original Twitter-linked scholarly accounts. The dataset includes  432,417 unique author_ids and 388,968 unique tweeter_ids forming 462,427 unique author-tweeter pairs.   File descriptions authors_tweeters_2024_02.csv is the actual dataset of author IDs paired with tweeter IDs. The "alternative" column indicates if the match was made with the primary name (0) or an alternate name (1). mapping_tweeters_2022_2024.csv contains the relationship made between the 2022 author IDs and the 2024 author IDs, including the names.   How to cite When using the dataset, please cite the following article providing details about the matching process: Mongeon, P., Bowman, T. D., & Costas, R. (2023). An open data set of scholars on Twitter. Quantitative Science Studies, 1–11. https://doi.org/10.1162/qss_a_00250
创建时间:
2024-04-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作