Phylogenetic double placement of mixed samples

DataONE2023-11-17 更新2025-08-02 收录

下载链接：

https://search.dataone.org/view/sha256:7ff06ddb7aa22d73a23d51fb130e02de4cb61594abf814fb18c0f40318aad26e

下载链接

链接失效反馈

官方服务：

资源简介：

Motivation Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction. Results We introduce a model that relates distances between a mixed sample and reference species to the distances between constituent..., , , # Data from: Phylogenetic double placement of mixed samples ## Citation Balaban, M., & Mirarab, S. (2020). Phylogenetic double placement of mixed samples. Bioinformatics (Oxford, England), 36(1), i335âi343. [doi:10.1093/bioinformatics/btaa489](https://doi.org/10.1093/bioinformatics/btaa489) ## Description of the data and file structure In all the datasets, files called `*results*.csv` have the following columns: * 1st column: `query` gives the query name, * 2nd column: is one of * `alien` is when both parents are removed from ref * `partial` is when one parent is removed from ref * `present` is when neither parent is removed from ref * 3rd column: the name of the method * 4th column: Either Primary or Secondary, for the two placements; primary is always the one with lower error * 5th column: Placement error in edges * [optional] 6th column: the `k` value used ### Columbicola (Lice) dataset (simulated mixture) To evaluate the accuracy of our method on genome skimmi...

创建时间：

2025-07-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集