five

Supplementary file 2_Reframing natural organic matter research through compositional data analysis.pdf

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_file_2_Reframing_natural_organic_matter_research_through_compositional_data_analysis_pdf/31879609
下载链接
链接失效反馈
官方服务:
资源简介:
Compositional data (CoDa) are prevalent in environmental research. They represent parts of a whole, such as percentages, proportions, and relative or absolute abundances. They are arrays of positive data that convey relevant information in the ratios between their components. Standard statistical techniques developed for real random observations often yield spurious results and are therefore unsuitable for CoDa, which has unique geometric properties. CoDa analysis is now widely acknowledged across various research fields, ranging from geoscience to social science, with a recent surge in popularity in microbial genomics. However, its adoption remains limited in natural organic matter (NOM) research, despite NOM data from key analytical tools such as mass spectrometry, fluorescence spectroscopy, and nuclear magnetic resonance spectroscopy all being compositional. Given the structural similarity between NOM and high-throughput sequencing data, for which CoDa analysis has been successfully adopted, we argue that CoDa analysis should also be consistently integrated into NOM research to prevent analytical pitfalls and misleading inferences. A few pioneering studies have applied CoDa analysis to NOM data, and a wide array of useful open-source tools are already available. This paper discusses step-by-step the application of CoDa analysis to NOM research, using ultrahigh-resolution mass spectrometry data as an illustrative example. The goal of the study is to provide the community with an overview of CoDa analysis and guide them on how to use it in practice.

成分数据(Compositional Data,缩写CoDa)在环境研究中极为常见。此类数据表征整体的组成部分,例如百分比、占比以及相对或绝对丰度,是由正实数构成的数组,其核心信息蕴含于各组分间的比例关系之中。针对实值随机观测值开发的标准统计方法往往会得到伪结果,因此不适用于具备独特几何特性的CoDa。目前,CoDa分析已在从地球科学到社会科学的众多研究领域得到广泛认可,且近年在微生物基因组学领域的应用热度持续攀升。然而,尽管质谱、荧光光谱及核磁共振波谱法等关键分析手段得到的天然有机质(Natural Organic Matter,缩写NOM)数据均属于成分数据,但CoDa分析在天然有机质研究中的应用仍十分有限。鉴于天然有机质数据与已成功应用CoDa分析的高通量测序数据在结构上具有相似性,我们认为应将CoDa分析系统性地融入天然有机质研究中,以规避分析陷阱与误导性推论。目前已有少量开创性研究将CoDa分析应用于天然有机质数据,且已有大量实用的开源工具可供使用。本文将以超高分辨率质谱数据为例,逐步阐述CoDa分析在天然有机质研究中的应用流程。本研究旨在为相关领域研究者提供CoDa分析的全景概述,并指导其在实际工作中开展应用。
创建时间:
2026-03-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作