Bibliographic dataset characterizing studies that use online biodiversity databases
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/2589439
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes bibliographic information for 501 papers that were published from 2010-April 2017 (time of search) and use online biodiversity databases for research purposes. Our overarching goal in this study is to determine how research uses of biodiversity data developed during a time of unprecedented growth of online data resources. We also determine uses with the highest number of citations, how online occurrence data are linked to other data types, and if/how data quality is addressed. Specifically, we address the following questions: 1.) What primary biodiversity databases have been cited in published research, and which databases have been cited most often? 2.) Is the biodiversity research community citing databases appropriately, and are the cited databases currently accessible online? 3.) What are the most common uses, general taxa addressed, and data linkages, and how have they changed over time? 4.) What uses have the highest impact, as measured through the mean number of citations per year? 5.) Are certain uses applied more often for plants/invertebrates/vertebrates? 6.) Are links to specific data types associated more often with particular uses? 7.) How often are major data quality issues addressed? 8.) What data quality issues tend to be addressed for the top uses? Relevant papers for this analysis include those that use online and openly accessible primary occurrence records, or those that add data to an online database. Google Scholar (GS) provides full-text indexing, which was important to identify data sources that often appear buried in the methods section of a paper. Our search was therefore restricted to GS. All authors discussed and agreed upon representative search terms, which were relatively broad to capture a variety of databases hosting primary occurrence records. The terms included: “species occurrence” database (8,800 results), “natural history collection” database (634 results), herbarium database (16,500 results), “biodiversity database” (3,350 results), “primary biodiversity data” database (483 results), “museum collection” database (4,480 results), “digital accessible information” database (10 results), and “digital accessible knowledge” database (52 results)--note that quotations are used as part of the search terms where specific phrases are needed in whole. We downloaded all records returned by each search (or the first 500 if there were more) into a Zotero reference management database. About one third of the 2500 papers in the final dataset were relevant. Three of the authors with specialized knowledge of the field characterized relevant papers using a standardized tagging protocol based on a series of key topics of interest. We developed a list of potential tags and descriptions for each topic, including: database(s) used, database accessibility, scale of study, region of study, taxa addressed, research use of data, other data types linked to species occurrence data, data quality issues addressed, authors, institutions, and funding sources. Each tagged paper was thoroughly checked by a second tagger. The final dataset of tagged papers allow us to quantify general areas of research made possible by the expansion of online species occurrence databases, and trends over time. Analyses of this data will be published in a separate quantitative review.
本数据集包含501篇论文的文献著录信息,这些论文发表于2010年至2017年4月(本次检索的时间范围),且均以在线生物多样性数据库(online biodiversity databases)开展研究。本研究的核心目标是,在在线数据资源实现空前增长的时期,厘清生物多样性数据的研究应用是如何发展演变的。此外,本研究还将明确被引频次最高的研究应用类型、在线物种出现数据与其他数据类型的关联方式,以及数据质量问题是否得到处理、如何处理。
具体而言,我们旨在解答以下研究问题:1. 已发表研究中引用了哪些核心生物多样性数据库,其中被引频次最高的数据库是哪些?2. 生物多样性研究学界是否规范引用数据库,且所引用的数据库当前是否可在线访问?3. 最常见的研究应用类型、涉及的核心生物类群以及数据关联方式分别是什么,它们随时间推移发生了哪些变化?4. 以年均被引频次衡量,哪些研究应用的影响力最高?5. 植物、无脊椎动物、脊椎动物是否分别更倾向于使用特定的研究应用类型?6. 特定数据类型的关联是否与特定研究应用类型存在更紧密的对应关系?7. 主要数据质量问题的处理频率如何?8. 针对高影响力的研究应用,通常会处理哪些数据质量问题?
本分析纳入的文献需满足以下条件之一:使用在线开放获取的核心物种出现记录,或为在线数据库补充数据。谷歌学术(Google Scholar, GS)具备全文索引功能,这对识别常隐藏于论文方法部分的数据源至关重要,因此本次检索限定于谷歌学术平台。所有作者共同讨论并确定了代表性检索词,为覆盖各类承载核心物种出现记录的数据库,检索词设置相对宽泛,具体包括:"species occurrence" database(物种出现数据库,检索结果8800条)、"natural history collection" database(自然历史馆藏数据库,检索结果634条)、herbarium database(标本馆数据库,检索结果16500条)、"biodiversity database"(生物多样性数据库,检索结果3350条)、"primary biodiversity data" database(核心生物多样性数据数据库,检索结果483条)、"museum collection" database(博物馆馆藏数据库,检索结果4480条)、"digital accessible information" database(数字可访问信息数据库,检索结果10条)以及"digital accessible knowledge" database(数字可访问知识数据库,检索结果52条)——注:当需要完整匹配特定短语时,检索词使用引号标注。
我们将每次检索返回的全部记录(若记录数超过500条则取前500条)导入Zotero文献管理数据库(Zotero reference management database)。最终入选的2500篇论文中,约三分之一为相关文献。三位具备该领域专业知识的作者,基于一系列核心研究主题,采用标准化标记规程对相关文献进行分类标注。我们编制了涵盖各主题的潜在标记词及对应说明,包括:使用的数据库、数据库可访问性、研究尺度、研究区域、涉及的生物类群、数据的研究应用场景、与物种出现数据关联的其他数据类型、处理的数据质量问题、作者、所属机构以及资助来源。每一篇完成标记的文献均由第二位标注人员进行全面复核。
最终的带标记论文数据集可用于量化在线物种出现数据库扩张所推动的研究领域,以及相关研究随时间的发展趋势。针对该数据集的分析成果将以独立的定量综述形式发表。
创建时间:
2023-06-28



