Data from: Fast coalescent-based computation of local branch support from quartet frequencies
收藏Mendeley Data2024-04-13 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.wstqjq2tk
下载链接
链接失效反馈官方服务:
资源简介:
## Simulations There are two simulated datasets that we used in this paper. ### ASTRALII dataset. This dataset contains estimated gene trees, true gene trees, true species trees, and inferred species trees with ASTRAL, RAxML, and NJST. Each archive contains a directory structure corresponding to: * model conditions (e.g., `model.200.10000000.0.0000001`), and under each of them, it includes one folder per replicate `[rep]`. * `k` is either 1000, 200, or 50, and is the number of genes. * `method` is either `astral`, `njst`, `concat` and refers to the method used for inferring the species tree. * Two types of gene trees `[gt]` are used: true gene trees `true` and estimated gene trees `half`. Note that `_[gt]` is not specified for `concat` method because it is irrelevant. ##### Files: * `ASTRALII-BL.tar.gz`: The branch length estimations. * `model.200.10000000.0.0000001/01/astral-bl-[k]-[gt].txt`: A file where each row is an internal branch and branch lengths are given in coalescent units (first column) and in the unit of the number of generations (third column). * `model[model]/[rep]/astral/astral_[k]_[gt]_sp.nwk`: The tree with branch lengths in coalescent unit for internal nodes. Ignore terminal branch lengths, which are in generation time. * `model[model]/[rep]/astral/astral_[k]_[gt]_sp_Stat`: The log file of astral * ` ASTRALII-pp.tar.gz`: The posterior probability estimate files. Files are the species trees with posterior probability annotated (newick format). Files are of type: * `model.[model]/[rep]/astral/[method]_[k]_[gt]`: full astral annotation (like `-t 12`). * Note: poorly named `model.[model]/[rep]/astral/[method]_[k]_[gt]_sp` files correspond to true species trees scored. * `model.[model]/[rep]/astral/[method]_[k]_[gt]Stat`: ASTARL log files * `ASTRALII.tar.gz`: raw simulated datasets. Files are of type `data/ASTRALII/200-taxa/model.[model]/[rep]/*` and include: * True (simulated) species trees (`s_tree.trees`) * True (simulated) gene trees (`truegenetrees`) * Estimated gene trees (FastTree), removing those with low resolution (`estimatedgenetre.halfresolved`) * Inferred species trees with ASTRAL (`astral-v474-p1`), RAxML (`concatenatedtree`), and NJST (`njst`) from 50, 200, or 1000 genes (the 50 and 200 are the first from the `estimatedgenetre.halfresolved` or `truegenetrees` files). ### Avian simulated dataset. This dataset contains drectories of the form `noscale.1000g.[bp]/R[rep]/` where `[bp]` is the sequence length (250, 500, 1000, 1500). In each directory, we have * `avian-astral.tre.blen`: Output of astral applied to bestML gene trees with correct branch lengths * `avian-astral.tre.blen.err`: ASTRAL log file when computing BL, applied to bestML input tree * `astral-bl.nwk`: ASTRAL tree with full annotations. * `genetrees.gt`: 1000 bestML gene trees * `astral/BS.[bsrep].tre`: MLBS ASTRAL results, bootstrap replicate numbered `[bsrep]`. 200 replicates are done. * `astral/Best.tre`: ASTRAL applied on bestML gene trees with no branch lengths * `astral/RAxML_bipartitionsBranchLabels.bestML`: greedy consensus of MLBS replicates, as done by RAxML. * `astral/avian-astral-truegene.nwk`: results of applying ASTRAL on true gene trees for the species tree inferred from gene trees of this directory. * `astral/avian-astral-truegene.nwk.info-estimatedgenetree`: ASTRAL log file of applying ASTRAL on true gene trees for the species tree inferred from gene trees of this directory. ## Biological datasets In this paper, we analyzed 4 different biological datasets. The results and the datasets are available at `Biological.tar.gz`: * Avian biological dataset of Jarvis et. al. available at paper: Jarvis, Erich D., et. al. "Phylogenomic analyses data of the avian phylogenomics project." GigaScience 4.1 (2015): 1-9. * These data are available at [http://dx.doi.org/10.5524/101041](http://dx.doi.org/10.5524/101041) * The dataset analyzed by Xi et. al. available at paper: Xi, Zhenxiang, Liang Liu, Joshua S Rest, and Charles C. Davis. “Coalescent versus Concatenation Methods and the Placement of Amborella as Sister to Water Lilies.” Systematic Biology 63, no. 6 (November 1, 2014): 919–932. [http://doi.org/10.1093/sysbio/syu055](http://doi.org/10.1093/sysbio/syu055). * See Dryad repo doi:10.5061/dryad.qb251 for sequences. Gene trees provided to us by authors. * The 1KP dataset analyzed by Naim Matasci et. al. published at: Matasci, N., Hung, L.H., Yan, Z., Carpenter, E.J., Wickett, N.J., Mirarab, S., Nguyen, N., Warnow, T., Ayyampalayam, S., Barker, M. and Burleigh, J.G., 2014. Data access for the 1,000 Plants (1KP) project. GigaScience, 3(1), pp.1-10. * Data available on iPlant as well. [https://datacommons.cyverse.org/browse/iplant/home/shared/onekp_pilot](https://datacommons.cyverse.org/browse/iplant/home/shared/onekp_pilot) * The dataset analyzed by Prum et. al. available at: Prum, R.O., Berv, J.S., Dornburg, A., Field, D.J., Townsend, J.P., Lemmon, E.M. and Lemmon, A.R., 2015. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature. * Data available from the original paper ## Sharing/Access information ## Code/Software The simulated data are generated using Simphy and with scripts given [here](https://github.com/smirarab/astral2sims) * We used [ASTRAL (posteval)](https://github.com/smirarab/ASTRAL/tree/posteval) version 4.9.1 for scoring and [ASTRAL (master)](https://github.com/smirarab/ASTRAL/tree/master) version 4.9.8 for computing the branch length of the trees. * To have posterior probabilities of branches of main species tree and 2 other alternatives we used the [posteval branch](https://github.com/smirarab/ASTRAL/tree/posteval). ``` java −Xmx2000M −jar astral.4.9.1.jar −i [GENE TREES] −q [SPECIES TREE] −t 4 ``` To compute the branch lengths of main species tree we used the MAP solution with the command master : ``` java −Xmx2000M −jar astral.4.9.8.jar −i [GENE TREES] −q [SPECIES TREE] −t 2 ``` To compute the bootstrap support of the alternative topologies we used posteval: ``` java −Xmx2000M −jar astral.4.9.1.jar −i [BS-replicates] −q [SPECIES TREE] −t 5 ```
创建时间:
2024-04-06



