five

Predicting invasion success of cultivated naturalized plants in China

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.2ngf1vj08
下载链接
链接失效反馈
官方服务:
资源简介:
Plant invasions pose significant threats to native ecosystems, human health, and global economies. However, the complex and multidimensional nature of factors influencing plant invasions makes it challenging to predict and interpret their invasion success accurately. Using a robust machine learning algorithm, random forest, and an extensive suite of characteristics related to environmental niches, species traits, and propagule pressure, we developed a classification model to predict the invasion success of naturalized cultivated plants in China. Based on the final optimal model, we evaluated the relative importance of individual and grouped variables and their prediction performance. Our study identified key individual variables within each of three groupings: climatic suitability and native range size (environmental niches), phylogenetic distance to the closest native taxon and vegetative propagation mode (species traits), and the number of botanical gardens and provinces where species were cultivated (propagule pressure). Remarkably, when grouped variables were evaluated, the relative importance of grouped variables increased dramatically—by 13.5 to 17.7 times—compared to the cumulative importance of individual variables within a category. However, the relative importance of one category was primarily due to the number of variables within each category rather than its inherent characteristics. Synthesis and applications. Our findings emphasize the necessity of developing data-driven predictive tools for effective invasion risk assessment using large datasets. We also highlight the importance of grouped variables in enhancing model interpretability. For practical application in China, we recommend prioritizing surveillance of alien plant species with large native ranges and high climatic suitability. Implementing a tiered risk assessment system based on our random forest model can allow for a more effective allocation of resources for monitoring and managing invasive species. Ultimately, interdisciplinary collaboration is crucial for implementing and applying these predictive tools, thereby protecting biodiversity, ecosystem services, and economic interests. Methods Data compilation We compiled a checklist of the 735 naturalized plant taxa introduced for cultivation in China, based on the Catalogue of Cultivated Plants in China (Lin, 2018) and The Checklist of the Naturalized Plants in China (Yan et al., 2019). The binomial names of these naturalized taxa were standardized according to The Plant List (TPL, version 1.1; http://www.theplantlist.org) using the R package 'Taxonstand' (Cayuela et al., 2021). Among these naturalized taxa, 435 were classified as non-invasive and 300 as invasive in China, based on information from Hao and Ma (2023). Non-invasive taxa are naturalized taxa that form self-sustaining populations outside cultivation but remain limited in spread. Invasive taxa are naturalized taxa that spread widely, causing economic, social, and ecological damage to the invaded ecosystems. These definitions are consistent with those used by Lin et al. (2021). To identify the characteristics that could potentially drive the invasion success of naturalized taxa, we compiled data on 12 characteristics for each taxon, resulting in 34 variables. The 34 variables were grouped into three categories, including species traits and similarity to native species (life form, propagation mode, maximum height, phylogenetic mean pairwise distance [MPD], phylogenetic nearest pairwise distance [NNPD], and weighted mean pairwise distance [wMPD] to the native flora in China), propagule pressure (the number of botanical gardens, the number of provinces in which taxa were cultivated, economic use category and the number of economic use categories), and environmental niches (climatic suitability, native range size and continents of origin). Native range size and continents of origin were quantified using the Taxonomic Databases Working Group (TDWG) level-1 (continental) and level-3 (regional) geographical classifications. A phylogenetic tree was constructed using 735 naturalized plant taxa and 30,248 native plant taxa in China. Further details are provided in the Supplementary Material. Data preparation There were 38 naturalized taxa with missing values for some of the 34 variables. Following the guidance of Breiman (2003), missing data were imputed by weighting the frequency of the non-missing values with proximity values, using the 'rfImpute' function in the R package 'randomForest' (Feng et al., 2020). To standardize the explanatory variables, we scaled all continuous variables to have a mean of zero and a standard deviation of one (Table S1). Before scaling, we also natural log(x + 1) transformed the number of botanical gardens, the number of provinces, the number of economic use categories, and native range size; natural log(x + 0.001) transformed climatic suitability; natural log transformed MPD, NNPD, wMPD, and maximum height to achieve more regular distributions of these variables. As maximum height is associated with life form, we scaled maximum height separately within each life-form category. To preliminarily explore the rationality of variable grouping, we employed Pearson correlation analysis to examine bivariate relationships and Self-Organizing Map (SOM) analysis to visualize high-dimensional data clustering patterns through dimensionality reduction (Kohonen, 2001). Further details are provided in the Supplementary Material.
创建时间:
2025-01-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作