five

ARK: Aggregate of Reads by K-Means for Estimation of Bacterial Community Composition. ARK: Aggregate of Reads by K-Means for Estimation of Bacterial Community Composition

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJEB9828
下载链接
链接失效反馈
官方服务:
资源简介:
There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. Availability: An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
创建时间:
2015-07-09
二维码
社区交流群
二维码
科研交流群
商业服务