five

TetrapodTraits Database

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10530617
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract  Tetrapods (amphibians, reptiles, birds and mammals) are model systems for global biodiversity science, but continuing data gaps, limited data standardisation, and ongoing flux in taxonomic nomenclature constrain integrative research on this group and potentially cause biassed inference. We combined and harmonised taxonomic, spatial, phylogenetic, and attribute data with phylogeny-based multiple imputation to provide a comprehensive data resource (TetrapodTraits 1.0.0) that includes values, predictions, and sources for body size, activity time, micro- and macrohabitat, ecosystem, threat status, biogeography, insularity, environmental preferences and human influence, for all 33,281 tetrapod species covered in recent fully sampled phylogenies. We assess gaps and biases across taxa and space, finding that shared data missing in attribute values increased with taxon-level completeness and richness across clades. Prediction of missing attribute values using multiple imputation revealed substantial changes in estimated macroecological patterns. These results highlight biases incurred by non-random missingness and strategies to best address them. While there is an obvious need for further data collection and updates, our phylogeny-informed database of tetrapod traits can support a more comprehensive representation of tetrapod species and their attributes in ecology, evolution, and conservation research. Additional Information: This work is output of the VertLife project. To flag erros, provide updates, or leave other comments, please go to vertlife.org. We aim to develop the database into a living resource at vertlife.org and your feedback is essential to improve data quality and support community use. Version 1.0.1 (25 May 2024). This minor release addresses a spelling error in the file Tetrapod_360.csv. The error involves replacing white-space characters with underscore characters in the field Scientific.Name to match the spelling used in the file TetrapodTraits_1.0.0.csv. These corrections affect only 102 species considered extinct and 13 domestic species (Bos_frontalis, Bos_grunniens, Bos_indicus, Bos_taurus, Camelus_bactrianus, Camelus_dromedarius, Capra_hircus, Cavia_porcellus, Equus_caballus, Felis_catus, Lama_glama, Ovis_aries, Vicugna_pacos). All extinct and domestic species in TetrapodTraits have their binomial names separated by underscore symbols instead of white space. Additionally, we have added the file GridCellShapefile.zip, which contains the shapefile required to map species presence across the 110 × 110 km equal area grid cells (this file was previously provided through an External Source here). Version 1.0.0 (19 April 2024). TetrapodTraits, the full phylogenetically coherent database we developed, is being made publicly available to support a range of research applications in ecology, evolution, and conservation and to help minimise the impacts of biassed data in this model system. The database includes 24 species-level attributes linked to their respective sources across 33,281 tetrapod species. Specific fields clearly label data sources and imputations in the TetrapodTraits, while additional tables record the 10K values per missing entry per species. Taxonomy – includes 8 attributes that inform scientific names and respective higher-level taxonomic ranks, authority name, and year of species description. Field names: Scientific.Name, Genus, Family, Suborder, Order, Class, Authority, and YearOfDescription. Phylogenetic tree – includes 2 attributes that notify which fully-sampled phylogeny contains the species, along with whether the species placement was imputed or not in the phylogeny. Field names: TreeTaxon, TreeImputed. Body size – includes 7 attributes that inform length, mass, and data sources on species sizes, and details on the imputation of species length or mass. Field names: BodyLength_mm, LengthMeasure, ImputedLength, SourceBodyLength, BodyMass_g, ImputedMass, SourceBodyMass. Activity time – includes 5 attributes that describe period of activity (e.g., diurnal, fossorial) as dummy (binary) variables, data sources, details on the imputation of species activity time, and a nocturnality score. Field names: Diu, Noc, ImputedActTime, SourceActTime, Nocturnality. Microhabitat – includes 8 attributes covering habitat use (e.g., fossorial, terrestrial, aquatic, arboreal, aerial) as dummy (binary) variables, data sources, details on the imputation of microhabitat, and a verticality score. Field names: Fos, Ter, Aqu, Arb, Aer, ImputedHabitat, SourceHabitat, Verticality. Macrohabitat – includes 19 attributes that reflect major habitat types according to the IUCN classification, the sum of major habitats, data source, and details on the imputation of macrohabitat. Field names: MajorHabitat_1 to MajorHabitat_10, MajorHabitat_12 to MajorHabitat_17, MajorHabitatSum, ImputedMajorHabitat, SourceMajorHabitat. MajorHabitat_11, representing the marine deep ocean floor (unoccupied by any species in our database), is not included here. Ecosystem – includes 6 attributes covering species ecosystem (e.g., terrestrial, freshwater, marine) as dummy (binary) variables, the sum of ecosystem types, data sources, and details on the imputation of ecosystem. Field names: EcoTer, EcoFresh, EcoMar, EcosystemSum, ImputedEcosystem, SourceEcosystem. Threat status – includes 3 attributes that inform the assessed threat statuses according to IUCN red list and related literature. Field names: IUCN_Binomial, AssessedStatus, SourceStatus. RangeSize – the number of 110×110 grid cells covered by the species range map. Data derived from MOL. Latitude – coordinate centroid of the species range map. Longitude – coordinate centroid of the species range map. Biogeography – includes 8 attributes that present the proportion of species range within each WWF biogeographical realm. Field names: Afrotropic, Australasia, IndoMalay, Nearctic, Neotropic, Oceania, Palearctic, Antarctic. Insularity – includes 2 attributes that notify if a species is insular endemic (binary, 1 = yes, 0 = no), followed by the respective data source. Field names: Insularity, SourceInsularity. AnnuMeanTemp – Average within-range annual mean temperature (Celsius degree). Data derived from CHELSA v. 1.2. AnnuPrecip – Average within-range annual precipitation (mm). Data derived from CHELSA v. 1.2. TempSeasonality –  Average within-range temperature seasonality (Standard deviation × 100). Data derived from CHELSA v. 1.2. PrecipSeasonality –  Average within-range precipitation seasonality (Coefficient of Variation). Data derived from CHELSA v. 1.2. Elevation – Average within-range elevation (metres). Data derived from topographic layers in EarthEnv. ETA50K – Average within-range estimated time to travel to cities with a population >50K in the year 2015. Data from Nelson et al. (2019). HumanDensity – Average within-range human population density in 2017. Data derived from HYDE v. 3.2. PropUrbanArea – Proportion of species range map covered by built-up area, such as towns, cities, etc. at year 2017. Data derived from HYDE v. 3.2. PropCroplandArea – Proportion of species range map covered by cropland area, identical to FAO's category 'Arable land and permanent crops' at year 2017. Data derived from HYDE v. 3.2. PropPastureArea – Proportion of species range map covered by cropland, defined as Grazing land with an aridity index > 0.5, assumed to be more intensively managed (converted in climate models) at year 2017. Data derived from HYDE v. 3.2. PropRangelandArea – Proportion of species range map covered by rangeland, defined as Grazing land with an aridity index < 0.5, assumed to be less or not managed (not converted in climate models) at year 2017.  Data derived from HYDE v. 3.2. File content All files use UTF-8 encoding. ImputedSets.zip – the phylogenetic multiple imputation framework applied to the TetrapodTraits database produced 10,000 imputed values per missing data entry (= 100 phylogenetic trees x 10 validation-folds x 10 multiple imputations). These imputations were specifically developed for four fundamental natural history traits: Body length, Body mass, Activity time, and Microhabitat. To facilitate the evaluation of each imputed value in a user-friendly format, we offer 10,000 tables containing both observed and imputed data for the 33,281 species in the TetrapodTraits database. Each table encompasses information about the four targeted natural history traits, along with designated fields (e.g., ImputedMass) that clearly indicate whether the trait value provided (e.g., BodyMass_g) corresponds to observed (e.g., ImputedMass = 0) or imputed (e.g., ImputedMass = 1) data. Given that the complete set of 10,000 tables necessitates nearly 17GB of storage space, we have organized sets of 1,000 tables into separate zip files to streamline the download process. ImputedSets_1K.zip, imputations for trees 1 to 10. ImputedSets_2K.zip, imputations for trees 11 to 20. ImputedSets_3K.zip, imputations for trees 21 to 30. ImputedSets_4K.zip, imputations for trees 31 to 40. ImputedSets_5K.zip, imputations for trees 41 to 50. ImputedSets_6K.zip, imputations for trees 51 to 60. ImputedSets_7K.zip, imputations for trees 61 to 70. ImputedSets_8K.zip, imputations for trees 71 to 80. ImputedSets_9K.zip, imputations for trees 81 to 90. ImputedSets_10K.zip, imputations for trees 91 to 100. TetrapodTraits_1.0.0.csv –  the complete TetrapodTraits database, with missing data entries in natural history traits (body length, body mass, activity time, and microhabitat) replaced by the average across the 10K imputed values obtained through phylogenetic multiple imputation. Please note that imputed microhabitat (attribute fields: Fos, Ter, Aqu, Arb, Aer) and imputed activity time (attribute fields: Diu, Noc) are continuous variables within the 0-1 range interval. At the user's discretion, the types of microhabitat and activity time can be transformed into binary variables using a predefined threshold (e.g., 0.50), although we recommend utilizing the original imputed values. Tetrapod_360.csv – spatial intersections of the 110 x 110 km quadrats shapefile (GridCellShapefile.zip) with species geographic range maps from https://mol.org. GridCellShapefile.zip – contains grid cell shapefiles with a spatial resolution of 110 km, which are required to map the species listed in the Tetrapod_360.csv file. Please note that due to the limitation on the number of characters in shapefile field names, the field names of gridcells_110km.shp are displayed as ("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA"). Be aware to rename field names to ("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea") to match the terminology used for the 110 x 110 km grid cells in other files. External files The R-code used for data analysis is available at 10.5281/zenodo.10582069. Funding São Paulo Research Foundation (FAPESP) for grants supporting MRM (#2021/11840-6 and #2022/12231-6), LFT (#2016/25358-3), KC (#2020/12558-0), and RZC (#2022/15247-0); Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the fellowship to JJMG;  Conselho Nacional de Desenvolvimento Científico - CNPq for research grants in support of FPW (#311504/2020-5) and LFT (#302834/2020-6); U.S. National Science Foundation (NSF) for grants supporting RAP (DEB-1441719), RCKB (DEB-1441652), and WJ (DEB-1441737 and DEB-1441719). WJ also acknowledges support from NASA grants 80NSSC17K0282 and 80NSSC18K0435; and the E.O. Wilson Biodiversity Foundation. Citation Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3 Correspondence to: mariormoura@gmail.com

摘要 四足类(Tetrapods,包括两栖类、爬行类、鸟类和哺乳类)是全球生物多样性研究的经典模式类群,但持续存在的数据缺口、有限的数据标准化程度,以及分类命名体系的持续变动,制约了该类群的整合性研究,并可能导致有偏推断。我们结合并整合了分类学、空间、系统发育及性状数据,并采用基于系统发育的多重插补方法,构建了综合数据集资源**TetrapodTraits 1.0.0**。该数据集涵盖了近期完全采样的系统发育研究所覆盖的全部33281种四足类的体型、活动时间、微生境与大生境、生态系统、受威胁等级、生物地理区系、岛屿特有性、环境偏好以及人类影响等性状的实测值、预测值与数据来源。我们评估了类群和空间维度上的数据缺口与偏倚,发现性状值的共享数据缺失率随支系的类群完备性和丰富度升高而增加。通过多重插补对缺失性状值进行预测的结果显示,宏观生态格局的估计值发生了显著变化。这些结果凸显了非随机缺失数据带来的偏倚,并为最优应对策略提供了参考。尽管仍需进一步开展数据收集与更新工作,但我们这款基于系统发育信息的四足类性状数据库,能够为生态学、进化生物学与保护生物学研究中更全面地呈现四足类物种及其性状提供支撑。 附加信息:本数据集为**VertLife**项目的产出成果。若需标记错误、提交更新或留下其他意见,请访问vertlife.org。我们旨在将该数据库打造为可持续更新的动态资源,您的反馈对于提升数据质量、保障社区共享使用至关重要。 版本1.0.1(2024年5月25日):本次小幅更新修复了Tetrapod_360.csv文件中的一处拼写错误,具体为将Scientific.Name字段中的空格字符替换为下划线,以匹配TetrapodTraits_1.0.0.csv文件中的命名规范。本次修正仅涉及102种已灭绝物种与13种家养物种(包括Bos_frontalis、Bos_grunniens、Bos_indicus、Bos_taurus、Camelus_bactrianus、Camelus_dromedarius、Capra_hircus、Cavia_porcellus、Equus_caballus、Felis_catus、Lama_glama、Ovis_aries、Vicugna_pacos)。TetrapodTraits数据库中所有已灭绝与家养物种的双名法名称均已改用下划线分隔,而非空格。此外,本次更新新增了GridCellShapefile.zip文件,其中包含用于绘制物种在110×110km等面积网格单元中的分布所需的形状文件(该文件此前通过外部来源提供)。 版本1.0.0(2024年4月19日):我们开发的完整系统发育一致性数据库TetrapodTraits现已公开发布,旨在支撑生态学、进化生物学与保护生物学领域的各类研究应用,并助力降低该模式类群中数据偏倚带来的影响。该数据集涵盖了33281种四足类的24个物种级性状,每个性状均标注了对应的数据来源。TetrapodTraits的特定字段会明确标注数据来源与插补情况,附加表格则记录了每个物种每条缺失数据对应的10000条插补值。 分类学模块:包含8个性状,用于记录物种的科学名称及其对应的高级分类阶元、命名者姓名以及物种发表年份。字段名称:Scientific.Name、Genus、Family、Suborder、Order、Class、Authority、YearOfDescription。 系统发育树模块:包含2个性状,用于标注物种所属的完全采样系统发育树,以及该物种在系统发育树中的位置是否经过插补。字段名称:TreeTaxon、TreeImputed。 体型模块:包含7个性状,用于记录物种的体长、体重及其数据来源,并提供体长或体重插补的相关细节。字段名称:BodyLength_mm、LengthMeasure、ImputedLength、SourceBodyLength、BodyMass_g、ImputedMass、SourceBodyMass。 活动时间模块:包含5个性状,用于以虚拟(二分类)变量形式描述物种的活动时段(如昼行、穴居),并提供数据来源、活动时间插补细节以及夜行性得分。字段名称:Diu、Noc、ImputedActTime、SourceActTime、Nocturnality。 微生境模块:包含8个性状,用于以虚拟(二分类)变量形式覆盖物种的生境利用类型(如穴居、陆生、水生、树栖、空中),并提供数据来源、微生境插补细节以及垂直性得分。字段名称:Fos、Ter、Aqu、Arb、Aer、ImputedHabitat、SourceHabitat、Verticality。 大生境模块:包含19个性状,用于反映依据IUCN分类体系划分的主要生境类型、主要生境总数、数据来源以及大生境插补细节。字段名称:MajorHabitat_1至MajorHabitat_10、MajorHabitat_12至MajorHabitat_17、MajorHabitatSum、ImputedMajorHabitat、SourceMajorHabitat。本模块未包含代表深海海底的MajorHabitat_11(数据库中无物种占用该生境)。 生态系统模块:包含6个性状,用于以虚拟(二分类)变量形式覆盖物种的生态系统类型(如陆生、淡水、海洋),并提供生态系统类型总数、数据来源以及生态系统插补细节。字段名称:EcoTer、EcoFresh、EcoMar、EcosystemSum、ImputedEcosystem、SourceEcosystem。 受威胁等级模块:包含3个性状,用于记录依据IUCN红色名录及相关文献评估的物种受威胁等级。字段名称:IUCN_Binomial、AssessedStatus、SourceStatus。 分布区面积:物种分布范围地图覆盖的110×110km网格单元数量,数据来源于MOL。 纬度:物种分布范围地图的坐标质心。 经度:物种分布范围地图的坐标质心。 生物地理模块:包含8个性状,用于呈现物种分布范围在各WWF生物地理界的占比。字段名称:Afrotropic、Australasia、IndoMalay、Nearctic、Neotropic、Oceania、Palearctic、Antarctic。 岛屿特有性模块:包含2个性状,用于标注物种是否为岛屿特有种(二分类变量,1=是,0=否)及对应数据来源。字段名称:Insularity、SourceInsularity。 年平均温度:物种分布范围内的年平均温度(摄氏度),数据来源于CHELSA v.1.2。 年降水量:物种分布范围内的年平均降水量(毫米),数据来源于CHELSA v.1.2。 温度季节性:物种分布范围内的温度季节性(标准差×100),数据来源于CHELSA v.1.2。 降水季节性:物种分布范围内的降水季节性(变异系数),数据来源于CHELSA v.1.2。 海拔:物种分布范围内的平均海拔(米),数据来源于EarthEnv的地形图层。 ETA50K:2015年物种分布范围内到人口大于5万城市的平均通行时间,数据来源于Nelson等人(2019)的研究。 人口密度:2017年物种分布范围内的平均人口密度,数据来源于HYDE v.3.2。 建成区占比:2017年物种分布范围内城镇、城市等建成区的占比,数据来源于HYDE v.3.2。 耕地占比:2017年物种分布范围内的耕地占比,对应FAO的“可耕地和永久作物”类别,数据来源于HYDE v.3.2。 牧场占比:2017年物种分布范围内的牧场占比,定义为干旱指数>0.5的放牧地,被认为是集约化管理类型(在气候模型中已转换),数据来源于HYDE v.3.2。 草原占比:2017年物种分布范围内的草原占比,定义为干旱指数<0.5的放牧地,被认为是非集约化或未管理类型(在气候模型中未转换),数据来源于HYDE v.3.2。 文件内容 所有文件均采用UTF-8编码。 ImputedSets.zip:针对TetrapodTraits数据库应用的系统发育多重插补框架,为每条缺失数据条目生成了10000条插补值(计算方式为100棵系统发育树 × 10折交叉验证 × 10次多重插补)。本次插补专门针对4项核心生活史性状:体长、体重、活动时间与微生境。为便于用户以友好格式评估每条插补值,我们提供了10000张表格,涵盖TetrapodTraits数据库中33281个物种的实测与插补数据。每张表格均包含上述4项目标生活史性状的信息,并通过指定字段(如ImputedMass)明确标注对应性状值(如BodyMass_g)为实测值(ImputedMass = 0)还是插补值(ImputedMass = 1)。鉴于完整的10000张表格占用存储空间近17GB,我们将每1000张表格打包为一个独立的压缩文件,以优化下载流程。 ImputedSets_1K.zip:包含第1至10棵系统发育树的插补结果。 ImputedSets_2K.zip:包含第11至20棵系统发育树的插补结果。 ImputedSets_3K.zip:包含第21至30棵系统发育树的插补结果。 ImputedSets_4K.zip:包含第31至40棵系统发育树的插补结果。 ImputedSets_5K.zip:包含第41至50棵系统发育树的插补结果。 ImputedSets_6K.zip:包含第51至60棵系统发育树的插补结果。 ImputedSets_7K.zip:包含第61至70棵系统发育树的插补结果。 ImputedSets_8K.zip:包含第71至80棵系统发育树的插补结果。 ImputedSets_9K.zip:包含第81至90棵系统发育树的插补结果。 ImputedSets_10K.zip:包含第91至100棵系统发育树的插补结果。 TetrapodTraits_1.0.0.csv:完整的TetrapodTraits数据库文件,其中生活史性状(体长、体重、活动时间与微生境)的缺失数据条目已替换为通过系统发育多重插补得到的10000条插补值的平均值。请注意,插补后的微生境(对应字段:Fos、Ter、Aqu、Arb、Aer)与活动时间(对应字段:Diu、Noc)为取值范围在0-1之间的连续变量。用户可自行选择通过预设阈值(如0.50)将微生境与活动时间类型转换为二分类变量,但我们建议保留原始插补值。 Tetrapod_360.csv:110×110km样方形状文件(GridCellShapefile.zip)与https://mol.org提供的物种地理分布范围地图的空间交集文件。 GridCellShapefile.zip:包含空间分辨率为110km的网格单元形状文件,用于绘制Tetrapod_360.csv中列出的物种分布。请注意,由于形状文件字段名称的字符限制,gridcells_110km.shp的字段名称显示为("Cl_I110", "Long", "Lat", "WWF_Rlm", "PrpLndA")。请将字段名称重命名为("Cell_Id110", "Long", "Lat", "WWF_Realm", "PropLandArea"),以匹配其他文件中110×110km网格单元的命名规范。 外部文件:本研究数据分析所用的R代码可在10.5281/zenodo.10582069获取。 资助信息:圣保罗研究基金会(FAPESP)通过编号为#2021/11840-6与#2022/12231-6的项目资助MRM、编号为#2016/25358-3的项目资助LFT、编号为#2020/12558-0的项目资助KC以及编号为#2022/15247-0的项目资助RZC;巴西高等教育精英发展协调局(CAPES)为JJMG提供奖学金;巴西国家科学技术发展委员会(CNPq)通过编号为#311504/2020-5的项目资助FPW以及编号为#302834/2020-6的项目资助LFT;美国国家科学基金会(NSF)通过DEB-1441719资助RAP、DEB-1441652资助RCKB以及DEB-1441737与DEB-1441719资助WJ;WJ另获NASA项目编号80NSSC17K0282与80NSSC18K0435的资助;此外本研究还得到了E.O.威尔逊生物多样性基金会的支持。 引用格式:Moura, M.R., Ceron, K., Guedes, J.J.M., Chen-Zhao, R., Sica, Y.V., Hart, J., Dorman, W., Gonzalez-del-Pliego, P., Ranipeta, A., Catenazzi, A.., Werneck, F.P., Toledo, L.F., Upham, N.S., Tonini, J.F.R., Colston, T.J., Guralnick, R., Bowie, R.C.K., Pyron, R.A., Jetz, W. A phylogeny-informed characterisation of global tetrapod traits addresses data gaps and biases. BioRXiv, 2024, doi: 10.1101/2023.03.04.531098v3 通讯作者:mariormoura@gmail.com
创建时间:
2024-10-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作