five

Optimal Permutation Recovery in Permuted Monotone Matrix Model

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Optimal_Permutation_Recovery_in_Permuted_Monotone_Matrix_Model/11673999
下载链接
链接失效反馈
官方服务:
资源简介:
Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix modelY=ΘΠ+Z, where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessing steps and Guanine-Cytosine (GC) adjustment. In this model, Θ is an unknown mean matrix with monotone entries for each row, Π is a permutation matrix that permutes the columns of Θ, and Z is a noise matrix. This article studies the problem of estimation/recovery of Π given the observed noisy matrix Y. We propose an estimator based on the best linear projection, which is shown to be minimax rate-optimal for both exact recovery, as measured by the 0-1 loss, and partial recovery, as quantified by the normalized Kendall’s tau distance. Simulation studies demonstrate the superior empirical performance of the proposed estimator over alternative methods. We demonstrate the methods using a synthetic metagenomics dataset of 45 closely related bacterial species and a real metagenomic dataset to compare the bacterial growth dynamics between the responders and the nonresponders of the IBD patients after 8 weeks of treatment. Supplementary materials for this article are available online.

受近期基于基因组组装(genome assemblies)量化细菌生长动态的研究启发,本文考虑置换单调矩阵模型 $Y=ThetaPi+Z$。其中,矩阵的行对应不同样本,列对应基因组组装中的重叠群(contigs),元素则为经过预处理步骤与鸟嘌呤-胞嘧啶(Guanine-Cytosine, GC)校正后的对数读段计数(log-read counts)。在该模型中,$Theta$ 为每行元素均满足单调性约束的未知均值矩阵,$Pi$ 为用于置换 $Theta$ 列的置换矩阵(permutation matrix),$Z$ 为噪声矩阵。本文聚焦于基于观测到的带噪声矩阵 $Y$ 估计或恢复置换矩阵 $Pi$ 的问题。本文提出基于最优线性投影的估计量,并证明该估计量在精确恢复(以0-1损失(0-1 loss)衡量)与部分恢复(以归一化肯德尔τ距离(normalized Kendall’s tau distance)量化)两种场景下均达到极小极大率最优(minimax rate-optimal)。仿真实验表明,所提估计量的经验性能优于各类现有替代方法。本文通过两个数据集验证所提方法:一是包含45个近缘细菌物种的合成宏基因组(metagenomics)数据集,二是真实宏基因组数据集,用于对比炎症性肠病(Inflammatory Bowel Disease, IBD)患者接受8周治疗后,应答者与非应答者的细菌生长动态差异。本文补充材料可在线获取。
创建时间:
2021-09-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作