five

Metazoan Mixtures of Trees Dataset

收藏
DataCite Commons2024-06-27 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/Metazoan_Mixtures_of_Trees_Dataset/26087386
下载链接
链接失效反馈
官方服务:
资源简介:
Metazoan Mixtures of Trees Dataset<i>Figshare repository documentation</i><i>Caitlin Cherryh 2024</i>The relationships between Animal clades is a contentious and unresolved problem in phylogenetics. Different studies with different parameters choices and phylogenetic pipelines recover different tree topologies. Commonly proposed hypotheses include: Ctenophora (Comb jellies) as the first Metazoan clade to diverge; Porifera (sponges) as the first Metazoan clade to diverge; and a monophyletic clade consisting of both Ctenophora and Porifera as the first Metazoan clade to diverge. Due to the complexity of reconstructing short branches at deep evolutionary timescales, there is little consensus on the relationships between these clades. Here, we use multitree mixture models to show that all combinations of previously analysed datasets and models exhibit substantial support for a wide range of hypotheses. To do this, we apply the recently developed multitree mixture model MAST (Mixtures Across Sites and Trees) (Wong et al. 2024, doi: 10.1093/sysbio/syae008) to all combinations of 14 previously published datasets and 26 models of molecular evolution. The MAST method uses a mixture of bifurcating trees to represent multiple evolutionary histories for a single concatenated alignment. This relaxes the treelikeness assumption and allows us to investigate heterogeneous phylogenetic signal within the Metazoa.The caitlinch/metazoan-mixtures GitHub repository contains all R scripts necessary to repeat these analyses: https://github.com/caitlinch/metazoan-mixturesSoftware programsIQ-Tree2 with MAST implementation (included in all versions &gt; v2.3.0, latest version preferred)Repository structureInput files<b>empirical_datasets.pdf</b>Documentation of the 14 empirical alignments analysed in this study, including original manuscript and record of where each matrix was obtained.<b>input_files/ </b>directory<b>alignment_dimensions.csv</b>Details dataset naming scheme, file path, and number of taxa and sites in each alignment.<b>alternative_phylogenetic_hypotheses.nex</b>Outlines the 5 hypotheses of Metazoan evolution explored in this study. Used to generate constraint trees.Tree 1: Ctenophora-first (Porifera monophyletic)Tree 2: Porifera-first (Porifera monophyletic)Tree 3: Ctenophora+Porifera-first (Porifera monophyletic)Tree 4: Ctenophora first + Porifera paraphyleticTree 5: Porifera first + Porifera paraphyletic<b>Cherryh_MAST_metazoa_taxa_reconciliation.csv</b>Lists every taxon in every empirical dataset, with both the original name (I.e., the name used in the original alignment) and the relabelled name (I.e., the taxon name updated to have consistent formatting and naming across datasets)Used to update taxon names in trees to facilitate comparison across datasetsAlignments, logs, and output files<b>alignments/</b>Matrices used in this study. See <b>empirical_documents.pdf</b> for details<b>maximum_likelihood_trees.zip</b>Maximum likelihood trees estimated from every combination of the 14 empirical alignments and 26 models of evolution (14*26=364)Models of evolution were grouped into 4 classes for further analysis: Q, Mixture, PMSF (Posterior Mean Site Frequency), and PM (Profile Mixture) models<b>pmsf_site_frequency_files.zip</b>Site frequency file for each alignment, generated as part of the process for estimating a maximum likelihood tree with a PMSF class model.One <b>.zip</b> archive for each alignment, containing site frequency files for the substitution models: Poisson+C20, Poisson+C60, LG+C20, LG+C60<b>constraint_tree_files.zip</b>Guide trees used to constrain ML inference.Each guide tree is labelled constraint_tree_<i>X</i>, where <i>X</i> ∈ 1–5. The number corresponds to the hypothesis in the <b>input_files/alternative_phylogenetic_hypotheses.nex</b> file (I.e., constraint_tree_1 = Ctenophora-first).Note that Trees 4 and 5 could only be generated for datasets which included at least 2 Porifera species, with least 1 Porifera species from the Calcarea or Homoscleromorpha clades AND at least 1 Porifera species from the Hexactinellida or Demospongiae clades.<b>hypothesis_trees.zip</b>Constrained ML trees estimated using guide trees for each hypothesis of evolutionThe order of trees in each file corresponds with the hypothesis of evolution (I.e., the first tree is constrained by the Ctenophora-sister hypothesis)<b>2_trees/</b>: Comparing only the first 2 hypotheses of evolution from alternative_phylogenetic_hypotheses.nex (Ctenophora-sister and Porifera-sister)<b>5_trees/</b>: Comparing all 5 hypotheses of evolution from alternative_phylogenetic_hypotheses.nex. For alignments that did not meet the requirements for Trees 4 and 5, only 3 trees were compared.<b>MAST_output.zip</b>Results from applying MAST to estimate a mixture of trees from either the first 2 hypotheses of evolution defined by alternative_phylogenetic_hypotheses.nex (2-tree model), or all 5 hypotheses of evolution (5 tree model)We calculated a 2-tree and 5-tree model for each alignment for each of the 4 classes of model (2*14*4=112). Note that a couple of these analyses were not computational tractable and were excluded from the final results - see manuscript for more details.<b>AU_test_output.zip</b>Results from applying the AU test to either the trees in the 2-tree MAST model or to the trees in the 5-tree MAST modelWe calculated the AU test with 2 trees and with 5 trees for each alignment, under each of the 4 classes of model (2*14*4=112)Results<b>output_files/</b> directory<b>alignment_included_taxa.csv:</b> Details which taxa are included in each alignment (1 column per matrix)<b>alignment_site_details.csv:</b> Details about alignments including number of: total sites, constant sites, invariant sites, informative sites<b>results_AU_tree_topology_test_results.csv</b>: Output for AU test, extracted from IQ-Tree log/iqtree files<b>results_complete_BIC.csv</b>: BIC scores for ML trees and MAST models<b>results_MAST_output.csv</b>: Output for AU test, extracted from IQ-Tree log/iqtree files<b>results_maximum_likelihood_iqtreeOutput.csv</b>: BIC, log likelihood and model parameters (including substitution model, state frequencies rate heterogeneity across sites model) for each of the 364 maximum likelihood trees (14 datasets * 26 models of substitution)<b>results_ML_tree_topology_ManualCheck.csv</b>: Topology of each ML tree, checked by hand<b>summary_all_BIC.csv</b>: Summary of BIC values for MAST and ML tree analyses<b>summary_au_test_results_2trees.csv: </b>AU test results for the 2 tree analyses <b>summary_au_test_results_5trees.csv</b>: AU test results for the 5 tree analyses<b>summary_MAST_treeWeight_results_2trees.csv</b>: MAST results for the 2 tree analyses<b>summary_MAST_treeWeight_results_5trees.csv</b>: MAST test results for the 5 tree analyses<b>summary_ML_tree_topology.csv</b>: Summary of ML tree topology results (number of models with each topology for each dataset)<b>all_models_ML_Porifera_topology.csv</b>: Nicely formatted csv for manuscript table, shows topology of Porifera clade (either monophyletic or paraphyletic) for each of the 364 maximum likelihood trees <b>all_models_ML_tree_topology.csv</b>: Nicely formatted csv for manuscript table, shows ML tree topology for each of the 364 maximum likelihood trees all_models_ML_tree_topology.csv<br>
提供机构:
figshare
创建时间:
2024-06-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作