five

Dataset: ORCID-Linked Labeled Data for Evaluating Author Name Disambiguation at Scale

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/ORCID-Linked_Labeled_Data_for_Evaluating_Author_Name_Disambiguation_at_Scale/13404986
下载链接
链接失效反馈
官方服务:
资源简介:
This page contains four datasets released for the paper entitled "ORCID-Linked Labeled Data for Evaluating Author Name Disambiguation at Scale" to be published in Scientometrics (In print). 1. AUT_ORC.zip: this contains a list of 3M author name instances in MEDLINE linked to Author-ity2009. 2. AUT_NIH.zip: this contains a list of 313K author name instances in MEDLINE linked to NIH PI ID. 3. AUT_SCT_pairs.zip: this contains a list of 6.2M paper pairs and author byline positions in self-citation relation. 4. AUT-SCT_info.zip: this contains a list of 4.7M author name instances in self-citation relation as recorded in AUT_SCT_pairs. Information about an author name instance in AUT-SCT_pairs can be connected to AUT-SCT_info using the combination of PMID and Byline Position as a key. Please see the paper for details on how the datasets were created. Kim, J., & Owen-Smith, J. (In print). ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics. doi:10.1007/s11192-020-03826-6 The uploaded datasets were created by combining several data sources below. 1. ORCID data were downloaded from the link below for the 2018 version.Please refer to the policies on the use of ORCID data. https://info.orcid.org/public-data-file-use-policy/ 2. MEDLINE baseline data were downloaded from the link below for the 2016 version. Please refer to the policies on the use of MEDLINE data. https://www.nlm.nih.gov/databases/download/pubmed_medline.html 3. Author-ity2009, Ethnea, and Genni datasets were downloaded from the link below. Please refer to the policies on the use of those datasets. https://databank.illinois.edu/datasets/IDB-9087546 Please cite three papers below to properly give credits to the creators of the original datasets. Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi:10.1145/1552303.1552304 Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 4. The dataset of NIH ID linked to Author-ity2009 was downloaded from the link below. https://figshare.com/articles/dataset/PLoS_2016_csv/3407461/1 Please cite the paper below to properly give credits to the creators of the original dataset. Lerchenmueller, M. J., & Sorenson, O. (2016). Author Disambiguation in PubMed: Evidence on the Precision and Recall of Author-ity among NIH-Funded Scientists. PLOS ONE, 11(7), e0158731. doi:10.1371/journal.pone.0158731
创建时间:
2020-12-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作