five

Metazoan Mixtures of Trees Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Metazoan_Mixtures_of_Trees_Dataset/26087386
下载链接
链接失效反馈
官方服务:
资源简介:
Metazoan Mixtures of Trees DatasetFigshare repository documentation Caitlin Cherryh 2024 The relationships between Animal clades is a contentious and unresolved problem in phylogenetics. Different studies with different parameters choices and phylogenetic pipelines recover different tree topologies. Commonly proposed hypotheses include: Ctenophora (Comb jellies) as the first Metazoan clade to diverge; Porifera (sponges) as the first Metazoan clade to diverge; and a monophyletic clade consisting of both Ctenophora and Porifera as the first Metazoan clade to diverge. Due to the complexity of reconstructing short branches at deep evolutionary timescales, there is little consensus on the relationships between these clades. Here, we use multitree mixture models to show that all combinations of previously analysed datasets and models exhibit substantial support for a wide range of hypotheses. To do this, we apply the recently developed multitree mixture model MAST (Mixtures Across Sites and Trees) (Wong et al. 2024, doi: 10.1093/sysbio/syae008) to all combinations of 14 previously published datasets and 26 models of molecular evolution. The MAST method uses a mixture of bifurcating trees to represent multiple evolutionary histories for a single concatenated alignment. This relaxes the treelikeness assumption and allows us to investigate heterogeneous phylogenetic signal within the Metazoa. The caitlinch/metazoan-mixtures GitHub repository contains all R scripts necessary to repeat these analyses: https://github.com/caitlinch/metazoan-mixtures Software programsIQ-Tree2 with MAST implementation (included in all versions > v2.3.0, latest version preferred)Repository structureInput filesempirical_datasets.pdf Documentation of the 14 empirical alignments analysed in this study, including original manuscript and record of where each matrix was obtained.input_files/ directoryalignment_dimensions.csv Details dataset naming scheme, file path, and number of taxa and sites in each alignment.alternative_phylogenetic_hypotheses.nex Outlines the 5 hypotheses of Metazoan evolution explored in this study. Used to generate constraint trees.Tree 1: Ctenophora-first (Porifera monophyletic)Tree 2: Porifera-first (Porifera monophyletic)Tree 3: Ctenophora+Porifera-first (Porifera monophyletic)Tree 4: Ctenophora first + Porifera paraphyleticTree 5: Porifera first + Porifera paraphyleticCherryh_MAST_metazoa_taxa_reconciliation.csv Lists every taxon in every empirical dataset, with both the original name (I.e., the name used in the original alignment) and the relabelled name (I.e., the taxon name updated to have consistent formatting and naming across datasets)Used to update taxon names in trees to facilitate comparison across datasetsAlignments, logs, and output filesalignments/ Matrices used in this study. See empirical_documents.pdf for detailsmaximum_likelihood_trees.zip Maximum likelihood trees estimated from every combination of the 14 empirical alignments and 26 models of evolution (14*26=364)Models of evolution were grouped into 4 classes for further analysis: Q, Mixture, PMSF (Posterior Mean Site Frequency), and PM (Profile Mixture) modelspmsf_site_frequency_files.zip Site frequency file for each alignment, generated as part of the process for estimating a maximum likelihood tree with a PMSF class model.One .zip archive for each alignment, containing site frequency files for the substitution models: Poisson+C20, Poisson+C60, LG+C20, LG+C60constraint_tree_files.zip Guide trees used to constrain ML inference.Each guide tree is labelled constraint_tree_X, where X ∈ 1–5. The number corresponds to the hypothesis in the input_files/alternative_phylogenetic_hypotheses.nex file (I.e., constraint_tree_1 = Ctenophora-first).Note that Trees 4 and 5 could only be generated for datasets which included at least 2 Porifera species, with least 1 Porifera species from the Calcarea or Homoscleromorpha clades AND at least 1 Porifera species from the Hexactinellida or Demospongiae clades.hypothesis_trees.zip Constrained ML trees estimated using guide trees for each hypothesis of evolutionThe order of trees in each file corresponds with the hypothesis of evolution (I.e., the first tree is constrained by the Ctenophora-sister hypothesis)2_trees/: Comparing only the first 2 hypotheses of evolution from alternative_phylogenetic_hypotheses.nex (Ctenophora-sister and Porifera-sister)5_trees/: Comparing all 5 hypotheses of evolution from alternative_phylogenetic_hypotheses.nex. For alignments that did not meet the requirements for Trees 4 and 5, only 3 trees were compared.MAST_output.zip Results from applying MAST to estimate a mixture of trees from either the first 2 hypotheses of evolution defined by alternative_phylogenetic_hypotheses.nex (2-tree model), or all 5 hypotheses of evolution (5 tree model)We calculated a 2-tree and 5-tree model for each alignment for each of the 4 classes of model (2*14*4=112). Note that a couple of these analyses were not computational tractable and were excluded from the final results - see manuscript for more details.AU_test_output.zip Results from applying the AU test to either the trees in the 2-tree MAST model or to the trees in the 5-tree MAST modelWe calculated the AU test with 2 trees and with 5 trees for each alignment, under each of the 4 classes of model (2*14*4=112)Resultsoutput_files/ directoryalignment_included_taxa.csv: Details which taxa are included in each alignment (1 column per matrix)alignment_site_details.csv: Details about alignments including number of: total sites, constant sites, invariant sites, informative sitesresults_AU_tree_topology_test_results.csv: Output for AU test, extracted from IQ-Tree log/iqtree filesresults_complete_BIC.csv: BIC scores for ML trees and MAST modelsresults_MAST_output.csv: Output for AU test, extracted from IQ-Tree log/iqtree filesresults_maximum_likelihood_iqtreeOutput.csv: BIC, log likelihood and model parameters (including substitution model, state frequencies rate heterogeneity across sites model) for each of the 364 maximum likelihood trees (14 datasets * 26 models of substitution)results_ML_tree_topology_ManualCheck.csv: Topology of each ML tree, checked by handsummary_all_BIC.csv: Summary of BIC values for MAST and ML tree analysessummary_au_test_results_2trees.csv: AU test results for the 2 tree analyses summary_au_test_results_5trees.csv: AU test results for the 5 tree analysessummary_MAST_treeWeight_results_2trees.csv: MAST results for the 2 tree analysessummary_MAST_treeWeight_results_5trees.csv: MAST test results for the 5 tree analysessummary_ML_tree_topology.csv: Summary of ML tree topology results (number of models with each topology for each dataset)all_models_ML_Porifera_topology.csv: Nicely formatted csv for manuscript table, shows topology of Porifera clade (either monophyletic or paraphyletic) for each of the 364 maximum likelihood trees all_models_ML_tree_topology.csv: Nicely formatted csv for manuscript table, shows ML tree topology for each of the 364 maximum likelihood trees all_models_ML_tree_topology.csv
创建时间:
2024-06-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作