five

ORCID-Linked Labeled Data for Evaluating Author Name Disambiguation at Scale

收藏
Figshare2021-02-13 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/ORCID-Linked_Labeled_Data_for_Evaluating_Author_Name_Disambiguation_at_Scale/13404986/2
下载链接
链接失效反馈
官方服务:
资源简介:
This page contains four datasets released for the paper entitled "ORCID-Linked Labeled Data for Evaluating Author Name Disambiguation at Scale" to be published in Scientometrics (In print).<br>1. AUT_ORC.zip: this contains a list of 3M author name instances in MEDLINE linked to Author-ity2009.<br>2. AUT_NIH.zip: this contains a list of 313K author name instances in MEDLINE linked to NIH PI ID.<br>3. AUT_SCT_pairs.zip: this contains a list of 6.2M paper pairs and author byline positions in self-citation relation. <br>4. AUT-SCT_info.zip: this contains a list of 4.7M author name instances in self-citation relation as recorded in AUT_SCT_pairs. Information about an author name instance in AUT-SCT_pairs can be connected to AUT-SCT_info using the combination of PMID and Byline Position as a key.<br>Please see the paper for details on how the datasets were created.<br><br>Kim, J., &amp; Owen-Smith, J. (In print). ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics. doi:10.1007/s11192-020-03826-6<br><br>The uploaded datasets were created by combining several data sources below.<br>1. ORCID data were downloaded from the link below for the 2018 version.Please refer to the policies on the use of ORCID data.<br>https://info.orcid.org/public-data-file-use-policy/<br>2. MEDLINE baseline data were downloaded from the link below for the 2016 version.<br>Please refer to the policies on the use of MEDLINE data.<br><br>https://www.nlm.nih.gov/databases/download/pubmed_medline.html<br><br>3. Author-ity2009, Ethnea, and Genni datasets were downloaded from the link below.<br>Please refer to the policies on the use of those datasets.<br><br>https://databank.illinois.edu/datasets/IDB-9087546<br><br>Please cite three papers below to properly give credits to the creators of the original datasets.<br>Torvik, V. I., &amp; Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. Acm Transactions on Knowledge Discovery from Data, 3(3). doi:10.1145/1552303.1552304<br><br>Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA.http://hdl.handle.net/2142/88927<br>Smith, B., Singh, M., &amp; Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720<br>4. The dataset of NIH ID linked to Author-ity2009 was downloaded from the link below.<br>https://figshare.com/articles/dataset/PLoS_2016_csv/3407461/1<br><br>Please cite the paper below to properly give credits to the creators of the original dataset.<br><br>Lerchenmueller, M. J., &amp; Sorenson, O. (2016). Author Disambiguation in PubMed: Evidence on the Precision and Recall of Author-ity among NIH-Funded Scientists. PLOS ONE, 11(7), e0158731. doi:10.1371/journal.pone.0158731<br><br><br><br>
创建时间:
2021-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作