five

Researchers' first name variations, based on ORCID

收藏
Figshare2025-07-11 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Researchers_first_name_variations_based_on_ORCID/29544386
下载链接
链接失效反馈
官方服务:
资源简介:
First name variation based on ORCID, as present in Dimensions' open data platform.Query usedWITH name_variant AS ( SELECT TRIM(LOWER(pname.given_names)) AS first_name, TRIM(LOWER(variant.content)) AS variant_name, COUNT(DISTINCT s.orcid_identifier.path) AS researcher_count FROM ds-open-datasets.orcid.summaries_2024 AS s, UNNEST(s.person.other_names.names) AS variant JOIN UNNEST([s.person.name]) AS pname WHERE pname.given_names IS NOT NULL AND variant.content IS NOT NULL AND TRIM(LOWER(pname.given_names)) != TRIM(LOWER(variant.content)) AND NOT REGEXP_CONTAINS(variant.content, r"\s") -- remove variants with whitespace AND NOT REGEXP_CONTAINS(variant.content, r"[\?\.,]") -- remove variants with punctuation AND NOT REGEXP_CONTAINS(LOWER(variant.content), r"^(dr|professor|phd|doctor|n/a|reviewer|sociologist|lecturer|lecture|researcher|architect|everyone|physician)\b") AND NOT REGEXP_CONTAINS(LOWER(variant.content), r"\b(dr|professor|mr|ms|mrs|phd|doctor|n/a|reviewer|sociologist|lecturer|lecture|researcher|architect|everyone|physician)\b") GROUP BY first_name, variant_name ORDER BY researcher_count DESC)SELECT *FROM name_varianWITH name_variant AS ( SELECT TRIM(LOWER(pname.given_names)) AS first_name, TRIM(LOWER(variant.content)) AS variant_name, COUNT(DISTINCT s.orcid_identifier.path) AS researcher_count FROM ds-open-datasets.orcid.summaries_2024 AS s, UNNEST(s.person.other_names.names) AS variant JOIN UNNEST([s.person.name]) AS pname WHERE pname.given_names IS NOT NULL AND variant.content IS NOT NULL AND TRIM(LOWER(pname.given_names)) != TRIM(LOWER(variant.content)) AND NOT REGEXP_CONTAINS(variant.content, r"\s") -- remove variants with whitespace AND NOT REGEXP_CONTAINS(variant.content, r"[\?\.,]") -- remove variants with punctuation AND NOT REGEXP_CONTAINS(LOWER(variant.content), r"^(dr|professor|phd|doctor|n/a|reviewer|sociologist|lecturer|lecture|researcher|architect|everyone|physician)\b") AND NOT REGEXP_CONTAINS(LOWER(variant.content), r"\b(dr|professor|mr|ms|mrs|phd|doctor|n/a|reviewer|sociologist|lecturer|lecture|researcher|architect|everyone|physician)\b") GROUP BY first_name, variant_name ORDER BY researcher_count DESC)SELECT *FROM name_variantWHERE researcher_count > 1;WHERE researcher_count > 1;
创建时间:
2025-07-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作