Functional group classification using consensus clustering
收藏DataCite Commons2026-04-28 更新2026-05-07 收录
下载链接:
https://rdr.ucl.ac.uk/articles/dataset/Functional_group_classification_using_consensus_clustering/31833526
下载链接
链接失效反馈官方服务:
资源简介:
<b>Repository description</b>Data and code for "Functional group classification using consensus clustering" (Ubilla Pavez, Paz & Maynard, In Revision, PLOS Computational Biology). This paper presents a consensus clustering method that classifies species into functional groups while accounting for trait uncertainty and trait correlation, using repeated resampling with Gaussian Mixture Models synthesized into a consensus matrix.This repository contains the input trait data, taxonomic metadata, species name matching tables, and spatial diversity metrics used in the case study of global tree functional group classification. The consensus clustering pipeline code is available at https://github.com/pabloubilla/tree_clustering/.<b>File descriptions</b><b>Estimated_trait_table_with_monos.csv</b>Species-level trait data for 47,828 tree species across 18 traits (e.g., wood density, leaf area, tree height). Contains both observed and imputed (predicted) values from Maynard et al. (2022, <i>Nature Communications</i>). Each row is a species–trait combination. Columns: <b>accepted_bin</b> (species binomial), <b>fit</b> (fitting method, e.g. "phy" for phylogenetic), <b>LAT</b> and <b>LON</b> (coordinates of observation, NA if imputed), <b>TraitID</b> (numeric trait identifier), <b>trait</b> (full trait name), <b>trait_short</b> (abbreviated trait name), <b>quant</b> (whether predicted using quantile random forest), <b>pred_value</b> (predicted/imputed trait value), <b>obs_value</b> (observed value, NA if unavailable).<b>taxonomic_information.csv</b>Taxonomic classification for each tree species. Columns: <b>genus</b>, <b>family</b>, <b>order</b>, <b>group</b> (Angiosperms or Gymnosperms), <b>accepted_bin</b> (species binomial, used as join key), <b>mono_fern</b> (whether the species is a monocot or fern).<b>bgci_v1_3_matched_names.csv</b>Species name matching table linking names from the Botanic Gardens Conservation International (BGCI) ThreatSearch database (v1.3) to the accepted binomial names used in this study. Columns: <b>TaxonName</b> (original name in BGCI), <b>Author</b> (taxonomic authority), <b>accepted_bin</b> (matched accepted binomial).<b>global_tree_search_trees_1_7.csv</b>Species list from GlobalTreeSearch (v1.7; Beech et al. 2017), a global database of tree species and country distributions maintained by BGCI. Columns: <b>TaxonName</b> (species name), <b>Author</b> (taxonomic authority).<b>plot_diversity_metrics/grid_coordinates.csv</b>Grid cell reference table mapping grid IDs to geographic coordinates. Columns: <b>grid_id</b> (unique identifier), <b>Latitude</b>, <b>Longitude</b>.<b>plot_diversity_metrics/Functional_group_results.csv</b>Functional group diversity metrics calculated per grid cell using the 42 functional groups identified in this study, based on presence–absence data from Paz et al. (2024, <i>Global Ecology and Biogeography</i>). Columns: <b>Latitude</b>, <b>Longitude</b>, <b>nclust</b> (functional group richness, i.e. number of unique groups present), <b>cluster_simpson</b> (functional redundancy, i.e. Simpson's Index applied to functional groups).<b>plot_diversity_metrics/Paz_et_al_data.csv</b>Traditional diversity metrics per grid cell from Paz et al. (2024), used for comparison with the functional group metrics. Columns: <b>Latitude</b>, <b>Longitude</b>, <b>nspec</b> (species richness), <b>raoq</b> (Rao's quadratic entropy, i.e. mean pairwise trait distance), <b>fdr</b> (functional richness, i.e. convex hull volume in trait space).<br>
**仓库说明**
本仓库配套发表于《PLOS Computational Biology》(待刊)的论文《基于共识聚类的功能群分类》(Ubilla Pavez、Paz与Maynard)的数据与代码。该论文提出一种共识聚类方法,通过高斯混合模型(Gaussian Mixture Model)重复重采样并整合为共识矩阵,在考量性状不确定性与性状相关性的前提下,将物种划分为功能群。本仓库包含全球树木功能群分类案例研究中使用的输入性状数据、分类元数据、物种名称匹配表以及空间多样性指标。共识聚类流程代码可从https://github.com/pabloubilla/tree_clustering/获取。
**文件说明**
**Estimated_trait_table_with_monos.csv**:包含47,828个树种的物种级性状数据,涵盖18项性状(如木材密度、叶面积、树高),包含Maynard等人2022年发表于《Nature Communications》的观测值与插补(预测)值。每行对应一个物种-性状组合。字段说明:<b>accepted_bin</b>(物种二项式学名)、<b>fit</b>(拟合方法,例如"phy"代表系统发育拟合)、<b>LAT</b>与<b>LON</b>(观测坐标,插补值则为NA)、<b>TraitID</b>(数值型性状标识符)、<b>trait</b>(性状完整名称)、<b>trait_short</b>(性状缩写名称)、<b>quant</b>(是否通过分位数随机森林预测)、<b>pred_value</b>(预测/插补的性状值)、<b>obs_value</b>(观测值,无观测值则为NA)。
**taxonomic_information.csv**:各树种的分类学信息。字段说明:<b>genus</b>(属)、<b>family</b>(科)、<b>order</b>(目)、<b>group</b>(被子植物或裸子植物)、<b>accepted_bin</b>(物种二项式学名,用作连接键)、<b>mono_fern</b>(该物种是否为单子叶植物或蕨类)。
**bgci_v1_3_matched_names.csv**:物种名称匹配表,用于将国际植物园保护联盟(Botanic Gardens Conservation International, BGCI)ThreatSearch数据库(v1.3)中的名称匹配至本研究使用的标准二项式学名。字段说明:<b>TaxonName</b>(BGCI中的原始名称)、<b>Author</b>(分类学命名人)、<b>accepted_bin</b>(匹配得到的标准二项式学名)。
**global_tree_search_trees_1_7.csv**:来自《GlobalTreeSearch》(v1.7;Beech等人2017)的物种列表,该数据库是由BGCI维护的全球树种与国家分布数据集。字段说明:<b>TaxonName</b>(物种名称)、<b>Author</b>(分类学命名人)。
**plot_diversity_metrics/grid_coordinates.csv**:网格单元参考表,用于将网格ID映射至地理坐标。字段说明:<b>grid_id</b>(唯一标识符)、<b>Latitude</b>(纬度)、<b>Longitude</b>(经度)。
**plot_diversity_metrics/Functional_group_results.csv**:基于本研究确定的42个功能群,通过Paz等人2024年发表于《Global Ecology and Biogeography》的存在-缺失数据计算得到的每个网格单元的功能群多样性指标。字段说明:<b>Latitude</b>(纬度)、<b>Longitude</b>(经度)、<b>nclust</b>(功能群丰富度,即存在的独特功能群数量)、<b>cluster_simpson</b>(功能群冗余度,即应用于功能群的辛普森指数)。
**plot_diversity_metrics/Paz_et_al_data.csv**:来自Paz等人2024年研究的每个网格单元的传统多样性指标,用于与功能群指标进行对比。字段说明:<b>Latitude</b>(纬度)、<b>Longitude</b>(经度)、<b>nspec</b>(物种丰富度)、<b>raoq</b>(Rao二次熵,即平均两两性状距离)、<b>fdr</b>(功能丰富度,即性状空间中的凸包体积)。
提供机构:
University College London
创建时间:
2026-03-23



