five

Data from: Phylogenomic resolution of the cetacean tree of life using target sequence capture

收藏
Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/data-from-phylogenomic-sequence-capture/1656138
下载链接
链接失效反馈
官方服务:
资源简介:
The evolution of the cetaceans, from their early transition to an aquatic lifestyle to their subsequent diversification, has been the subject of numerous studies. However, while the higher-level relationships among cetacean families have been largely settled, several aspects of the systematics within these groups remain unresolved. Problematic clades include the oceanic dolphins (37 spp.), which have experienced a recent rapid radiation, and the beaked whales (22 spp.), which have not been investigated in detail using nuclear loci. The combined application of high-throughput sequencing with techniques that target specific genomic sequences provide a powerful means of rapidly generating large volumes of orthologous sequence data for use in phylogenomic studies. To elucidate the phylogenetic relationships within the Cetacea, we combined sequence capture with Illumina sequencing to generate data for ~3200 protein-coding genes for 68 cetacean species and their close relatives including the pygmy hippopotamus. By combining data from >38,000 exons with existing sequences from 11 cetaceans and seven outgroup taxa, we produced the first comprehensive comparative genomic dataset for cetaceans, spanning 6,527,596 aligned base pairs and 89 taxa. Phylogenetic trees reconstructed with maximum likelihood and Bayesian inference of concatenated loci, as well as with coalescence analyses of individual gene trees, produced mostly concordant and well-supported trees. Our results completely resolve the relationships among beaked whales as well as the contentious relationships among ocean dolphins, especially the problematic subfamily Delphininae, which includes the common and bottlenose dolphins. We performed Bayesian estimation of species divergence times using MCMCtree, integrating recently described fossils as calibration points (e.g., Mystacodon selenensis) that have not been used before. Integration of new fossil dates in the context of autocorrelated rates indicate that the diversification of Crown Cetacea began before the Late Eocene and the divergence of Crown Delphinidae as early as the Middle Miocene.,Figure_S1Maximum likelihood phylogram of Dataset B with the maximum number of partitions. Bootstrap values are 100 for all 3 analyses except at 6 nodes labelled with a red circle; bootstrap values for these are shown in the upper left.Figure_S2ASTRAL species tree. All support values are 1.0 unless orthwise noted over the branch.FigureS3Time tree of Cetacea using the independent rates (IR) model. Numbers over each node correspond to raw values in Table 3.Table_S1Description of values for sequencing (ie. number of reads), Trinity (ie. number of contigs), and reciprocal BLAST searches for each sample for which we performed target sequence captureTable_S2List of Genbank accession numbers for sequences included in our analysis for Platanista gangetica and Balaenoptera omuraiDATASET_A.phylipDataset A, concatenated alignmentDATASET_BDataset B (without Platanista ganagetica and Balaenoptera omurai).Cetacea_gene_partitionRAXML partitions for each gene (3,191)PartitionFinder PartitionsRAXML partitions generated by Partition FinderpartitionfindersetsDATASET_A_RAxML_unpartitioned_best_treeBest tree for unpartitioned analysis of RAxML using DATASET ARAxML_unpartitioned_best_tree.treeDATASET_A_RAxML_unpartitioned_bootstrapDATASET_A_RAxML_unpartitioned_bootstrapRAxML_unpartitioned_bootstrap.resultDATASET_A_RAxMLpartitionfinder_best_treeBest tree of RAxML analysis of Dataset A using the partition scheme generated by Partition Finder.RAxMLpartitionfinder_best_tree.treDATASET_A_RAxML_partitionfinder_bootstrap.resultDATASET_A_RAxML_partitionfinder_bootstrap treesRAxML_partitionfinder_bootstrap.result.txtDATASET_A_RAxML_partition_by_gene_best_treeBest tree of RAxML analysis partitioned by gene and using DATASET ARAxML_partition_by_gene_best_tree.treDATASET_A_RAxML_bootstrap_partition_by_geneDATASET_A_RAxML_bootstrap_partition_by_geneRAxML_bootstrap_partition_by_gene.resultDATASET_B_RAxML_unpartitioned_bestTreeDATASET_B_RAxML_unpartitioned_bestTreeRAxML_DATASET_B_unpartitioned_bestTree.resultDATASET_B_RAxML_bootstrap_unpartitioned.resultDATASET_B_RAxML_bootstrap_unpartitioned treesRAxML_bootstrap_unpartitioned.result.txtDATASET_B_RAxML_partitionfinder_bestTreeDATASET_B_RAxML_partitionfinder_bestTreeRAxML_DATASET_B_partitionfinder_bestTree.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultDATASET_B_RAxML_bootstrap_partitionfinder.resultRAxML_bootstrap_partitionfinder.result.txtDATASET_B_RAxML_partition_by_gene_bestTreeDATASET_B_RAxML_partition_by_gene_bestTreeRAxML_DATASET_B_all_genes_bestTree.resultDATASET_B_RAXML_boostrap_partition_by_gene_resultDATASET_B_RAXML_boostrap_partition_by_gene_resultRAXML_boostrap_by_gene_result.txtExabayes_treeTree resulting from the ExaBayes analysisBayes_tree.nexBayes_tree.nexASTRAL input of RAxML gene trees for each of the 3,191 genesASTRAL input of RAxML gene trees for each of the 3,191 genesRAXML_gene_trees_ASTRAL_inputASTRAL_species_tree_resultsResults of the ASTRAL species tree analysisASTRAL_species_tree.txtMCMCTree inputDataset including the top 1/3 of genes in terms of divergence between odontocetes and mysticetes. This was the inout for all MCMCTree analyses.GENE_LIST3.phylipMCMCTREE.treTree input with calibration points for all MCMCTree analysesHessian matrix file for input in MCMCTree analysesin.BVOutput for MCMCTree Strict clock Run 1Output for MCMCTree Strict clock analysis; Run 1out_clock_1_1.txtOutput for MCMCTree Strict clock Run 2Output for MCMCTree Strict clock analysis; Run 2out_clock_1_2.txtOutput for MCMCTree IR analysis; Run 1Output for MCMCTree IR analysis; Run 1out_clock2_1.txtOutput for MCMCTree IR analysis; Run 2Output for MCMCTree IR analysis; Run 2out_clock_2_2.txtOutput for MCMCTree AR analysis; Run 1Output for MCMCTree AR analysis; Run 1out_clock3_1.txtOutput for MCMCTree AR analysis; Run 2Output for MCMCTree AR analysis; Run 2out_clock3_2.txtFigure_S2Figure S2. Tracer file showing convergence of -lnL values for both runs of the Bayesian analysis using ExaBayes.Figure_S3Figure S3. Species tree of Dataset B generated by ASTRAL. All nodes have posterior probabilities of 1.0, except for those with values listed above the node.Figure_S4_Tracer_3_ARFigure S4. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with autocorrelated rates using MCMCTree.Figure_S5_Tracer_3_IRFigure S5. Tracer file showing convergence of -lnL values for both runs of the 3-partition analysis with independent rates using MCMCTree.Figure_S6_Tracer_6_ARFigure S6. Tracer file showing convergence of -lnL values for both runs of the 6-partition analysis with autocorrelated rates using MCMCTree.Figure_S7_Tracer_6_IRFigure_S8Figure S8. Cetartiodactyl tree with the topology from Figure 3 with nodes labelled corresponding to the list of mean ages and 95% confidence intervals (CIs) for both the AR and IR models of the 6-partition scheme in Table S3.Figure_S9Figure S9. Timetree of Cetacea analyzed in the MCMCTree package of PAML 4.9h using 3 partitions and approximate likelihood (Yang, 2007). A time scale in Ma (millions of years) is shown above the tree, with geologic periods labelled below the tree for reference (Q=Quaternary). Above each node the posterior distributions of the AR model (purple) and IR model (white) are shown. Red circles at each node represent calibrationSupplemental_Figure_CaptionsTable_S3Cetacea_ExaBayes Input FileInput file for ExaBayes analyses.Cetacea_ExaBayes.phyConfiguration file used in ExaBayes analysesconfig.nexTopologies for ExaBayes Run 1ExaBayes_topologies.run-0.Cetacea_1Parameters for ExaBayes Run 1ExaBayes_parameters.run-0.Cetacea_1Topologies for ExaBayes Run 2ExaBayes_topologies.run-0.Cetacea_2Parameters for ExaBayes Run 2ExaBayes_parameters.run-0.Cetacea_2Cetacea_partition_mcmctree_3Alignment file for the 3 partition analyses for MCMCTreeCetacea_partition_mcmctree_6Alignment file for the 6-partition analyses in MCMCTreeHessian matrix file for input in 3-partition MCMCTree analysesin.BV1-3Hessian matrix file for input in 6-partition MCMCTree analysesin.BV1-6Tree file for MCMCTree analysesMCMCTREE.treResult file for 3-partition mcmctree AR Run 1parts_3_mcmctree_AR_mcmc.txtFigTree result for 3-partition mcmctree AR Run 1FigTree_parts_3_mcmctree_AR_1.treControl file for 3-partition AR analyses MCMCTreemcmctree_3p_AR.ctlResult file for 3-partition mcmctree AR Run 2parts_3_mcmctree_AR_2_mcmc.txtFigTree result for 3-partition mcmctree AR Run 2FigTreeparts_3_mcmctree_AR_2.treResult file for 3-partition mcmctree IR Run 1parts_3_mcmctree_IR_mcmc.txtFigTree result for 3-partition mcmctree IR Run 1FigTree_parts_3_mcmctree_IR.treControl file for 3-partition IR analyses MCMCTreemcmctree_3p_IR.ctlResult file for 3-partition mcmctree IR Run 2parts_3_mcmctree_IR_2_mcmc.txtFigTree result for 3-partition mcmctree IR Run 2FigTree_parts_3_mcmctree_IR_2.treResult file for 6-partition mcmctree AR Run 1parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 1FigTree_parts_6_mcmctree_AR.treControl file for 6-partition AR analyses MCMCTreemcmctree_6p_AR.ctlResult file for 6-partition mcmctree AR Run 2parts_6_mcmctree_AR_mcmc.txtFigTree result for 6-partition mcmctree AR Run 2FigTree_parts_6_mcmctree_AR.treResult file for 6-partition mcmctree IR Run 1parts_6_mcmctree_IR_mcmc.txtFigTree result for 6-partition mcmctree IR Run 1FigTree_parts_6_mcmctree_IR.treControl file for 6-partition IR analyses MCMCTreemcmctree_6p_IR.ctlResult file for 6-partition mcmctree IR Run 2parts_6_mcmctree_IR_2_mcmc.txtFigTree result for 6-partition mcmctree IR Run 2FigTree_parts_6_mcmctree_IR_2.tre,

鲸类(Cetacea)的演化历程,从其早期向水生生活方式的过渡到后续的辐射分化,始终是诸多研究关注的核心议题。尽管鲸类各科间的高阶系统发育关系已基本明确,但类群内部的分类系统学仍存在多个未解决的问题。其中存在争议的演化支包括海洋海豚(共37个物种)——它们经历了近期的快速辐射演化——以及喙鲸(共22个物种),此前尚未有研究利用核基因座对其开展详细分析。 将高通量测序与靶向特定基因组序列的技术相结合,可为系统发育基因组学研究快速获取大量直系同源(orthologous)序列数据提供高效手段。为阐明鲸类内部的系统发育关系,本研究结合序列捕获(sequence capture)技术与Illumina测序平台,为68个鲸类物种及其近缘类群(包括倭河马)生成了约3200个蛋白质编码基因的测序数据。通过将超过38000个外显子(exon)的数据与11个鲸类物种及7个外类群(outgroup taxa)的已有序列进行整合,我们构建了首个覆盖全面的鲸类比较基因组数据集,该数据集包含6527596个比对后的碱基对,涉及89个分类单元。 通过对拼接后的基因座分别采用最大似然法(maximum likelihood)与贝叶斯推断(Bayesian inference)进行系统发育树重建,并结合单个基因树的溯祖分析(coalescence analyses),我们得到了整体一致性较高且支持度良好的系统发育树。本研究结果完全解决了喙鲸类群间的系统发育关系,同时厘清了海洋海豚类群间存在争议的分类关系,尤其是包含普通海豚与宽吻海豚的海豚亚科(Delphininae)这一极具争议的类群。 我们利用MCMCtree软件开展物种分化时间的贝叶斯估计,整合了新近报道的化石作为校准点(calibration points,例如*Mystacodon selenensis*,此前未被应用于相关研究)。结合自相关速率模型的新化石年代数据显示,冠群鲸类(Crown Cetacea)的辐射分化始于始新世晚期之前,而冠群海豚科(Crown Delphinidae)的分化最早可追溯至中新世中期。 以下为本研究涉及的数据集与补充材料说明: 图S1:数据集B的最大似然法系统发育树(最大分区数)。除6个标注红色圆圈的节点外,所有节点的自举值均为100;这些节点的自举值标注于左上角。 图S2:ASTRAL物种树。除另有标注外,所有分支的支持值均为1.0。 图S3:基于独立速率(IR)模型的鲸类时间树。每个节点上方的数字对应表3中的原始数值。 表S1:每个进行靶向序列捕获的样本的测序数据(即reads数量)、Trinity组装结果(即contig数量)及双向BLAST搜索结果的相关参数说明。 表S2:本研究分析中用到的恒河豚(Platanista gangetica)和小布氏鲸(Balaenoptera omurai)序列的GenBank登录号列表。 DATASET_A.phylip:数据集A,拼接后的比对序列文件 DATASET_B:数据集B(不含恒河豚和小布氏鲸) Cetacea_gene_partition:每个基因的RAXML分区文件(共3191个) PartitionFinder Partitions:由PartitionFinder生成的RAXML分区文件 partitionfindersets:由Partition Finder生成的RAXML分区文件 DATASET_A_RAxML_unpartitioned_best_tree:基于数据集A的非分区RAXML分析得到的最优树 RAxML_unpartitioned_best_tree.tree:最优树文件 DATASET_A_RAxML_unpartitioned_bootstrap:非分区RAXML分析的自举检验结果 RAxML_unpartitioned_bootstrap.result:自举检验结果文件 DATASET_A_RAxMLpartitionfinder_best_tree:采用PartitionFinder生成的分区方案对数据集A进行RAXML分析得到的最优树 RAxMLpartitionfinder_best_tree.tre:最优树文件 DATASET_A_RAxML_partitionfinder_bootstrap.result:采用PartitionFinder分区方案的RAXML自举检验结果 DATASET_A_RAxML_partitionfinder_bootstrap trees:采用PartitionFinder分区方案的RAXML自举检验树文件 RAxML_partitionfinder_bootstrap.result.txt:自举检验结果文件 DATASET_A_RAxML_partition_by_gene_best_tree:按基因分区并采用数据集A的RAXML分析得到的最优树 RAxML_partition_by_gene_best_tree.tre:最优树文件 DATASET_A_RAxML_bootstrap_partition_by_gene:按基因分区的RAXML自举检验结果 RAxML_bootstrap_partition_by_gene.result:自举检验结果文件 DATASET_B_RAxML_unpartitioned_bestTree:基于数据集B的非分区RAXML分析得到的最优树 RAxML_DATASET_B_unpartitioned_bestTree.result:最优树结果文件 DATASET_B_RAxML_bootstrap_unpartitioned.result:数据集B非分区RAXML分析的自举检验结果 DATASET_B_RAxML_bootstrap_unpartitioned trees:数据集B非分区RAXML自举检验树文件 RAxML_bootstrap_unpartitioned.result.txt:自举检验结果文件 DATASET_B_RAxML_partitionfinder_bestTree:采用PartitionFinder分区方案对数据集B进行RAXML分析得到的最优树 RAxML_DATASET_B_partitionfinder_bestTree.result:最优树结果文件 DATASET_B_RAxML_bootstrap_partitionfinder.result:采用PartitionFinder分区方案的数据集B RAXML自举检验结果 RAxML_bootstrap_partitionfinder.result.txt:自举检验结果文件 DATASET_B_RAxML_partition_by_gene_bestTree:按基因分区并采用数据集B的RAXML分析得到的最优树 RAxML_DATASET_B_all_genes_bestTree.result:最优树结果文件 DATASET_B_RAXML_boostrap_partition_by_gene_result:按基因分区的数据集B RAXML自举检验结果 RAXML_boostrap_by_gene_result.txt:自举检验结果文件 Exabayes_tree:ExaBayes分析得到的系统发育树 Bayes_tree.nex:ExaBayes分析得到的系统发育树文件(Nexus格式) ASTRAL input of RAxML gene trees for each of the 3,191 genes:3191个基因的RAXML基因树的ASTRAL输入文件 RAXML_gene_trees_ASTRAL_input:ASTRAL输入文件 ASTRAL_species_tree_results:ASTRAL物种树分析结果 ASTRAL_species_tree.txt:ASTRAL物种树结果文件 MCMCTree input:数据集,选取齿鲸与须鲸间分化程度排名前1/3的基因,为所有MCMCTree分析的输入文件 GENE_LIST3.phylip:基因列表文件(Phylip格式) MCMCTREE.tre:包含所有校准点的MCMCTree分析用树文件 Hessian matrix file for input in MCMCTree analyses:MCMCTree分析用Hessian矩阵输入文件 in.BV:MCMCTree严格钟模型分析Run1的输出结果 out_clock_1_1.txt:MCMCTree严格钟模型分析Run1的输出结果 out_clock_1_2.txt:MCMCTree严格钟模型分析Run2的输出结果 out_clock2_1.txt:MCMCTree独立速率(IR)模型分析Run1的输出结果 out_clock_2_2.txt:MCMCTree独立速率(IR)模型分析Run2的输出结果 out_clock3_1.txt:MCMCTree自相关速率(AR)模型分析Run1的输出结果 out_clock3_2.txt:MCMCTree自相关速率(AR)模型分析Run2的输出结果 图S2:Tracer文件,展示ExaBayes贝叶斯分析两次运行的-lnL值收敛情况 图S3:数据集B的ASTRAL物种树。所有节点的后验概率均为1.0,除非节点上方标注了具体数值 图S4_Tracer_3_AR:Tracer文件,展示采用MCMCTree进行3分区自相关速率(AR)模型分析的两次运行的-lnL值收敛情况 图S5_Tracer_3_IR:Tracer文件,展示采用MCMCTree进行3分区独立速率(IR)模型分析的两次运行的-lnL值收敛情况 图S6_Tracer_6_AR:Tracer文件,展示采用MCMCTree进行6分区自相关速率(AR)模型分析的两次运行的-lnL值收敛情况 图S7_Tracer_6_IR:Tracer文件,展示采用MCMCTree进行6分区独立速率(IR)模型分析的两次运行的-lnL值收敛情况 图S8:鲸类偶蹄目树,采用图3的拓扑结构,节点标注对应表S3中6分区方案的自相关速率(AR)与独立速率(IR)模型的平均分化时间及95%置信区间(CI) 图S9:采用PAML 4.9h的MCMCTree模块,通过3分区近似似然法(Yang, 2007)分析得到的鲸类时间树。树上方标注以百万年前(Ma)为单位的时间尺度,树下方标注地质年代作为参考(Q=第四纪(Quaternary))。每个节点上方分别展示自相关速率(AR)模型(紫色)与独立速率(IR)模型(白色)的后验分布。每个节点处的红色圆圈代表校准点 补充图注表 Table_S3:表S3 Cetacea_ExaBayes Input File:ExaBayes分析的输入文件 Cetacea_ExaBayes.phy:ExaBayes分析的输入比对文件 config.nex:ExaBayes分析的配置文件 ExaBayes_topologies.run-0.Cetacea_1:ExaBayes Run1的拓扑结构文件 ExaBayes_parameters.run-0.Cetacea_1:ExaBayes Run1的参数文件 ExaBayes_topologies.run-0.Cetacea_2:ExaBayes Run2的拓扑结构文件 ExaBayes_parameters.run-0.Cetacea_2:ExaBayes Run2的参数文件 Cetacea_partition_mcmctree_3:用于MCMCTree 3分区分析的比对文件 Cetacea_partition_mcmctree_6:用于MCMCTree 6分区分析的比对文件 in.BV1-3:用于3分区MCMCTree分析的Hessian矩阵输入文件 in.BV1-6:用于6分区MCMCTree分析的Hessian矩阵输入文件 MCMCTREE.tre:MCMCTree分析用树文件 parts_3_mcmctree_AR_mcmc.txt:3分区AR模型MCMCTree分析Run1的结果文件 FigTree_parts_3_mcmctree_AR_1.tre:3分区AR模型MCMCTree分析Run1的FigTree可视化结果文件 mcmctree_3p_AR.ctl:3分区AR模型MCMCTree分析的控制文件 parts_3_mcmctree_AR_2_mcmc.txt:3分区AR模型MCMCTree分析Run2的结果文件 FigTreeparts_3_mcmctree_AR_2.tre:3分区AR模型MCMCTree分析Run2的FigTree可视化结果文件 parts_3_mcmctree_IR_mcmc.txt:3分区IR模型MCMCTree分析Run1的结果文件 FigTree_parts_3_mcmctree_IR.tre:3分区IR模型MCMCTree分析Run1的FigTree可视化结果文件 mcmctree_3p_IR.ctl:3分区IR模型MCMCTree分析的控制文件 parts_3_mcmctree_IR_2_mcmc.txt:3分区IR模型MCMCTree分析Run2的结果文件 FigTree_parts_3_mcmctree_IR_2.tre:3分区IR模型MCMCTree分析Run2的FigTree可视化结果文件 parts_6_mcmctree_AR_mcmc.txt:6分区AR模型MCMCTree分析Run1的结果文件 FigTree_parts_6_mcmctree_AR.tre:6分区AR模型MCMCTree分析Run1的FigTree可视化结果文件 mcmctree_6p_AR.ctl:6分区AR模型MCMCTree分析的控制文件 parts_6_mcmctree_AR_mcmc.txt:6分区AR模型MCMCTree分析Run2的结果文件 FigTree_parts_6_mcmctree_AR.tre:6分区AR模型MCMCTree分析Run2的FigTree可视化结果文件 parts_6_mcmctree_IR_mcmc.txt:6分区IR模型MCMCTree分析Run1的结果文件 FigTree_parts_6_mcmctree_IR.tre:6分区IR模型MCMCTree分析Run1的FigTree可视化结果文件 mcmctree_6p_IR.ctl:6分区IR模型MCMCTree分析的控制文件 parts_6_mcmctree_IR_2_mcmc.txt:6分区IR模型MCMCTree分析Run2的结果文件 FigTree_parts_6_mcmctree_IR_2.tre:6分区IR模型MCMCTree分析Run2的FigTree可视化结果文件
提供机构:
The University of Western Australia
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作