five

Optimal Permutation Recovery in Permuted Monotone Matrix Model

收藏
DataCite Commons2024-02-12 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Optimal_Permutation_Recovery_in_Permuted_Monotone_Matrix_Model/11673999/2
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Motivated by recent research on quantifying bacterial growth dynamics based on genome assemblies, we consider a permuted monotone matrix model</b>Y=ΘΠ+Z<b>, where the rows represent different samples, the columns represent contigs in genome assemblies and the elements represent log-read counts after preprocessing steps and Guanine-Cytosine (GC) adjustment. In this model, Θ is an unknown mean matrix with monotone entries for each row, Π is a permutation matrix that permutes the columns of Θ, and <i>Z</i> is a noise matrix. This article studies the problem of estimation/recovery of Π given the observed noisy matrix <i>Y</i>. We propose an estimator based on the best linear projection, which is shown to be minimax rate-optimal for both exact recovery, as measured by the 0-1 loss, and partial recovery, as quantified by the normalized Kendall’s tau distance. Simulation studies demonstrate the superior empirical performance of the proposed estimator over alternative methods. We demonstrate the methods using a synthetic metagenomics dataset of 45 closely related bacterial species and a real metagenomic dataset to compare the bacterial growth dynamics between the responders and the nonresponders of the IBD patients after 8 weeks of treatment. Supplementary materials for this article are available online.</b>

受近期基于基因组组装量化细菌生长动态的研究启发,本文考虑置换单调矩阵模型 $Y = ThetaPi + Z$。其中,矩阵的行代表不同样本,列代表基因组组装中的重叠群(contigs),矩阵元素代表经过预处理步骤与鸟嘌呤-胞嘧啶(Guanine-Cytosine, GC)校正后的对数读段计数。在该模型中,$Theta$ 为每行元素均满足单调性的未知均值矩阵,$Pi$ 为对 $Theta$ 的列进行置换的置换矩阵,$Z$ 为噪声矩阵。本文研究在观测到带噪声矩阵 $Y$ 的条件下,估计/恢复置换矩阵 $Pi$ 的问题。本文提出基于最优线性投影的估计器,证明其在0-1损失衡量的精确恢复任务,以及归一化肯德尔τ(Kendall’s tau)距离量化的部分恢复任务中,均达到极小极大最优速率。仿真研究表明,所提估计器的经验表现优于其他替代方法。我们使用包含45种近缘细菌物种的合成宏基因组数据集,以及真实宏基因组数据集,对比了炎症性肠病(Inflammatory Bowel Disease, IBD)患者接受8周治疗后的应答者与非应答者的细菌生长动态。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2020-08-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作