How reliable are species identifications in biodiversity big data? Evaluating the records of a neotropical fish family in online repositories

Figshare2020-04-03 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/How_reliable_are_species_identifications_in_biodiversity_big_data_Evaluating_the_records_of_a_neotropical_fish_family_in_online_repositories/12078519

下载链接

链接失效反馈

官方服务：

资源简介：

The increase of free and open online biodiversity databases is of paramount importance for current research in ecology and evolution. However, little attention is paid to using updated taxonomy in these “biodiversity big data” repositories and the quality of their taxonomic information is often questioned. Here we assess how reliable is the current use of nomenclatural classification in the distributional information available from two biodiversity information networks: GBIF and the Brazilian SpeciesLink. We use as a study case the records of Auchenipteridae, a Neotropical fish family that has been subject to recent taxonomical reviews. A data filtering procedure was applied to identify and quantify the inaccuracies in the taxonomical status of the records in three steps: assessment of identification accuracy at the family, genus or species level; current validity of species name; and assignation of inaccurate species records to different categories of classification quality. Synonyms, nonexistent combinations, and outdated combinations were reassigned to currently valid species. A total of 9148 records of Auchenipteridae fishes were analyzed, of which 4165 were from GBIF and 4983 from SpeciesLink, deriving from 46 and 31 sources, respectively. After correcting all possible records following the taxonomic data filtering steps, 6988 records (76.4% of the original) were adequate for describing species distributions, while 2160 remained inaccurate. The most inaccurate records at the species level were due to the use of outdated nomenclatures, resulting in non-valid combinations of species and genus, and synonymy. Our results evidence a large taxonomic inconsistency among records, and, most importantly, that taxonomic information obtained from repositories should be used with caution. Many inaccuracy issues may be embedded in the biodiversity databases’ records, which could lead researchers to provide an incomplete or even mistaken perspective of the variations in the natural world.

免费开放的在线生物多样性数据库数量持续增长，对当前生态学与进化生物学研究具有至关重要的意义。然而，此类"生物多样性大数据"资源库鲜少关注更新后的分类学信息的应用，其分类学信息的质量也常遭质疑。为此，本研究针对两大生物多样性信息网络——全球生物多样性信息设施（GBIF）与巴西SpeciesLink所提供的分布数据中，分类命名体系的当前使用可靠性展开评估。本研究以斧翅鲿科（Auchenipteridae）——一类近期经历分类学修订的新热带鱼类类群——的记录作为研究案例。研究采用一套数据过滤流程，分三步对记录的分类学状态误差进行识别与量化：其一，评估科、属或物种水平的鉴定准确性；其二，核查物种名称的当前有效性；其三，将存在误差的物种记录划分为不同的分类质量类别。研究将异名、不存在的分类组合以及过时的分类组合重新归类至当前有效的物种名下。本研究共分析了9148条斧翅鲿科鱼类的记录，其中4165条来自GBIF（源自46个数据源），4983条来自SpeciesLink（源自31个数据源）。经分类学数据过滤流程完成所有可行的记录校正后，共有6988条记录（占原始记录的76.4%）可用于物种分布描述，剩余2160条记录仍存在分类误差。物种水平的误差记录多源于使用了过时的命名法，进而产生无效的属种组合与异名混用问题。本研究结果表明，各类记录间存在大量分类学不一致性；尤为关键的是，从资源库获取的分类学信息需谨慎使用。生物多样性数据库的记录中可能潜藏诸多误差问题，这可能会导致研究者对自然界的变异情况得出不完整甚至错误的认知。

创建时间：

2020-04-03