five

Improper data practices erode the quality of global ecological databases and impede the progress of ecological research

收藏
DataONE2024-01-02 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:f4d2b1937242f8d1f33a8ef5d1901b09434e93e4b0b3bd3534b749471a989615
下载链接
链接失效反馈
官方服务:
资源简介:
The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata. Missing data codes: NA and N/A

科学界已迈入大数据时代。然而,大数据伴随而来的是重大责任,而面向数据库的数据贡献最佳实践,尚未能跟上大数据的收集、聚合与分析步伐。本研究严格评估了全球规模最大且使用最广泛的植物性状数据库——TRY植物性状数据库(TRY Plant Trait Database)中可获取的比叶面积(Specific Leaf Area, SLA)数据体量,探究了其中具备适用性(即原始、具代表性、符合逻辑且可比较)与可追溯性(即已发表、可引用且保持一致)的数据占比。结果显示,TRY数据库中超过四分之三的比叶面积数据要么缺乏适用性,要么不具备可追溯性;相较于标准数据清洗流程通常认定的64.9%可用率,本研究中原始数据的实际可用占比仅为22.9%。剩余可用数据与多数物种的原始数据存在显著差异,这导致生态分析的解读发生了改变。尽管本研究所考察的数据仅占……,我们仍从TRY数据库(性状编号3115、3116及3117)下载了所有针叶树类群(南洋杉科(Araucariaceae)、柏科(Cupressaceae)、松科(Pinaceae)、罗汉松科(Podocarpaceae)、金松科(Sciadopityaceae)及红豆杉科(Taxaceae))、车前属(Plantago)、早熟禾属(Poa)以及栎属(Quercus)的物种的比叶面积数据。本数据集未经过任何预处理,但已新增若干数据列,为使用者提供各数据点的来源、引用情况、测量方法、是否正确上传、是否已预先上传至TRY数据库,以及是否由数据收集者本人上传等相关信息。本研究附带两份补充文档:其一为Word文档,详细介绍了本研究涵盖的4类植物类群(针叶树、车前属(Plantago)、早熟禾属(Poa)及栎属(Quercus))中包含比叶面积数据的120个数据集;其二为Excel文档,包含从TRY数据库下载的比叶面积数据及所有关联元数据。缺失值编码为:NA 与 N/A
创建时间:
2025-07-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作