five

Data from: Discovering biogeographic and ecological clusters with a graph theoretic spin on factor analysis

收藏
Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/data-from-discovering-factor-analysis/1959068
下载链接
链接失效反馈
官方服务:
资源简介:
Factor analysis (FA) has the advantage of highlighting each semi-distinct cluster of samples in a data set with one axis at a time, as opposed to simply arranging samples across axes to represent gradients. However, in the case of presence-absence data it is confounded by absences when gradients are long. No statistical model can cope with this problem because the raw data simply do not present underlying information about the length of such gradients. Here I propose a simple way to tease out this information. It is a simple emendation of FA called stepping down, which involves giving an absence a negative value when the missing species nowhere co-occurs with the species found in the relevant sample. Specifically, a binary co-occurrence graph is created, and the magnitude of negative values is made a function of how far the graph must be traversed in order to link the missing species with each species that is present. Simulations show that standard FA yields inferior results to FA based on stepped-down matrices in terms of mapping clusters into axes one-by-one. Standard FA is also uninformative when applied to a global bat inventory data set. Step-down FA (SDFA) easily flags the main biogeographic groupings. Methods like correspondence analysis, non-metric multidimensional scaling, and Bayesian latent variable modelling are not commensurate with SDFA because they do not seek to find a one-to-one mapping of axes and clusters. Stepping down seems promising as a means of illustrating clusters of samples, especially when there are subtle or complex discontinuities in gradients. Usage Notes bat referencesA list of references to publications yielding site-specific inventory data for bats from around the world. Raw data are also reposited in the Ecological Register.bat_references.txtbat registerSite-specific inventory data for bats from around the world. Each line includes a count of the individuals belonging to a species found at a site. Raw data are also reposited in the Ecological Register.bat_register.txt

因子分析(Factor Analysis,FA)的优势在于,可逐轴提取数据集中每个半独立样本簇,而非仅通过轴上排布的样本表征梯度变化。然而,当梯度跨度较长时,针对有无型数据的因子分析会受限于物种缺失的样本。尚无统计模型可解决该问题,因为原始数据并未提供这类梯度跨度的潜在信息。本文提出一种简易方法以提取此类潜在信息:该方法是对因子分析的一种修正,名为“逐步降维法(stepping down)”,其核心逻辑为,当某缺失物种与对应样本中的所有检出物种均无共现关系时,将该缺失项赋值为负值。具体而言,首先构建二元共现图,负值的大小取决于缺失物种与各检出物种在图中所需遍历的路径长度。仿真实验表明,在逐轴映射样本簇的任务中,标准因子分析的效果劣于基于逐步降维矩阵的因子分析。将标准因子分析应用于全球蝙蝠调查数据集时,同样无法得到有效信息;而逐步降维因子分析(Step-down FA,SDFA)可轻松识别主要生物地理集群。对应分析、非度量多维标度法以及贝叶斯隐变量建模等方法与逐步降维因子分析并不兼容,因为它们并不追求轴与集群间的一一映射关系。逐步降维法在展示样本簇方面颇具应用前景,尤其适用于梯度存在细微或复杂间断的场景。 使用说明 蝙蝠参考文献(bat references):收录了全球各地蝙蝠样点调查数据的相关文献列表,原始数据亦存储于生态登记库(Ecological Register),对应文件为bat_references.txt。 蝙蝠登记册(bat register):收录全球各地蝙蝠样点调查数据,每行数据代表某样点中某检出物种的个体数量,原始数据亦存储于生态登记库(Ecological Register),对应文件为bat_register.txt。
提供机构:
Macquarie University
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作