Extensive ground truth dataset for the assessment of differential expression analyses in metaproteomics

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/ERP186249

下载链接

链接失效反馈

官方服务：

资源简介：

The high-dimensionality, sparsity, and compositionality of metaproteomic data pose serious challenges to statistical analyses aimed at identifying differences in gene expression and protein abundances for species in microbiomes / microbial communities. Indeed, while various different statistical methods have been used to analyze metaproteomic data, a comprehensive evaluation of these methods is still missing. A main obstacle for testing statistical approaches for the identification of differentially abundant proteins is that we need fully controlled/known datasets simulating the complexity of metaproteomic samples. Currently no such ground truth datasets are available to assess differential gene expression analyses in metaproteomics. Here, we introduce the first metaproteomic dataset with known protein abundance differences between multiple conditions and levels of sample complexity for identification, quantification, and statistical analyses. The dataset consists of 13 mixes, each in quadruplicate, which span three levels of complexity and multiple challenges for metaproteomics. Each mix consists of one of three complex protein matrices to which multiple different microbial species were added at different ratios (challenges for metaproteomics). The species were grown under differing conditions or combined with closely related strains in order to generate differential gene expression patterns. The species and conditions/strains included Thermus thermophilus at high/low temperature (TTHT/TTLT), Chlamydomonas reinhardtii at high/low light (CRHL/CRLL), Rhizobium leguminosarum strains VF3841 and VF39 (RLVF3841/RLVF39), Escherichia coli in LB or M9 medium (EcoliLB/EcoliM9), and Bacteroides thetaiotaomicron (Btheta) in a control growth condition. The different mixes were generated in quadruplicate using the different growth conditions as well as different levels of abundance (dominant organism, low abundance, similar abundance but different expression patterns, â¦) in order to produce datasets of different complexity for metaproteomics.

创建时间：

2026-01-28