Data from: Taller plants have lower rates of molecular evolution
收藏figshare.mq.edu.au2023-06-15 更新2025-01-21 收录
下载链接:
https://figshare.mq.edu.au/articles/dataset/Data_from_Taller_plants_have_lower_rates_of_molecular_evolution/20045273/1
下载链接
链接失效反馈官方服务:
资源简介:
Rates of molecular evolution have a central role in our understanding of many aspects of species’ biology. However, the causes of variation in rates of molecular evolution remain poorly understood, particularly in plants. Here we show that height accounts for about one-fifth of the among-lineage rate variation in the chloroplast and nuclear genomes of plants. This relationship holds across 138 families of flowering plants, and when accounting for variation in species richness, temperature, ultraviolet radiation, latitude and growth form. Our observations can be explained by a link between height and rates of genome copying in plants, and we propose a mechanistic hypothesis to account for this—the ‘rate of mitosis’ hypothesis. This hypothesis has the potential to explain many disparate observations about rates of molecular evolution across the tree of life. Our results have implications for understanding the evolutionary history and future of plant lineages in a changing world.
Usage Notes
01_ML_phylogeny.zipThis file contains all the data necessary to re-run our ML tree using the full alignmentcontaining all 564 species. There are 5 files. 'burleigh_concat.phy' is the alignment file in phylip format. 'commandline' is the commandline we used to run RAxML version 7. 'partitions' describes how we partitioned our data (RAxML uses that file). 'RAxML_result.burleigh_concat.raxml.out' has the tree file that results from our analysis. 'raxmlHPC' is the raxml executable we used to run our analyses, on a Mac desktop computer.02_sister_pairs.zipThis file contains data and R code for the sister pairs analyses. There are 2 files. 'raw_data_sister_pairs.csv' is a comma separated values file that contains all of the branch length and life history data we used in the sister pairs analyses. There is one row per sister pair in the analysis, which has data on: the two families in the sister pair, the proportion of genera we had height data for in each family, the number of species in each family, the average height of each family (log transformed value in mm), nuclear branch length for each family (in units of substitutions/site), chloroplast dN branch length for each family (also in substitutions/site), chloroplast dS branch length for each family (also in substitutions/site), Latitude for each family (in distance from the equator), UV for each family (measured as in Davies et al 2004), and Temperature for each family (in Kelvins). 'sister_pairs_analyses.r' contains all R code used for the sister pairs analyses. To use it, you will need to download R (it's free), install the relevant packages (at the top of the .r file), and then change the line at the top which starts 'setwd' to point to the folder on your computer that contains the raw_data_sister_pairs.csv file. As you go through the R code, it will print out all of the results in the paper that used sister pairs analyses.03_R8SThis file contains input files and the results files from running R8S on the ML tree, and on 1000 bootstraps of the ML tree. There are 3 files and 2 folders. 'base_files' is just a holder for some basic R8S input files, that the python script uses
(see below). This folder contains: the r8s executable we used (compiled for macs), the basic r8s.txt input file we used for each r8s analysis (which has data on our fossil calibrations), and a .txt file that contains the ML tree, and 1000 boostrapped trees estimated in RAxML as described in the paper. 'bootstrap_rates.txt' contains the results of the R8S analyses, each family is listed on its own row, and each row has 1001 associated columns. The first column contains the ML rate, subsequent columns contain bootstrap rates (in substitutions/site/myr). This is the main output file produced by the python script (see next). 'run_BS_r8s.py' is a python script that will run r8s on the ML tree and the 1000 bootstrap trees. Before running it, create an empty directory in the same folder as the script called "bootstrap_results", then change the "start_dir" and "tree_file" variables at the top of the script to point to the directory the script is in, and the tree file in the
'base_files' folder respectively. Then run the script using python. Briefly, the script takes each tree from the tree file, makes a r8s input file, runs r8s, then parses the output to extract the rates for each family. It then outputs these to the 'bootstrap_rates.txt' file. Be aware that the script stores all r8s results, which can take a lot of space (about 1GB) when the analyses are all complete.04_PGLSThis file contains 5 files, sufficient to re-do all of our PGLS analyses. 'bootstrap_rates.txt' contains the results of the ML r8s analysis and all 1000 subsequent bootstrap analyses. 'growth_forms.csv' contains information on the growth forms of species in each family. 'PGLS_analyses.r' is an R script which you can use to re-run all of our PGLS analyses. To use it you will need to change the line at the top that starts 'setwd' to point to the folder on your computer that contains all of the input files here. You'll also need to download R, and the packages listed at the top of the file. 'R8S_trees.txt' is a file of the 1001 trees from R8S. These are used in the PGLS analyses to correct for nonindepdence. The first tree is the ML tree, the rest are bootstrap trees. 'raw_data_sister_pairs.csv' is a csv file of the raw data. It's included here so that the R script will run without additional hassle. But it's identical to the file described in the '02_sister_pairs' section above.
分子进化的速率在理解物种生物学诸多方面扮演着核心角色。然而,分子进化速率变化的原因仍然理解不足,尤其是在植物中。本研究揭示,植物叶绿体和核基因组中谱系间进化的速率变异中,约有五分之一可归因于高度。这一关系贯穿于138个开花植物科,即使在考虑物种丰富度、温度、紫外线辐射、纬度和生长形式的变异后亦然。我们的观察结果可以通过植物高度与基因组复制速率之间的联系来解释,并提出了一种机制性假设以解释此现象——即‘有丝分裂速率’假设。该假设有望解释生命树中关于分子进化速率的许多不同观察结果。我们的研究结果对于理解植物谱系在变化世界中的进化历史和未来具有重要意义。
使用说明
01_ML_phylogeny.zip
此文件包含重运行我们使用包含全部564个物种的全序列ML树所需的所有数据。共有5个文件。'burleigh_concat.phy'是以phylip格式对齐文件。'commandline'是我们用于运行RAxML版本7的命令行。'partitions'描述了我们的数据分区方式(RAxML使用该文件)。'RAxML_result.burleigh_concat.raxml.out'包含分析结果生成的树文件。'raxmlHPC'是我们用于在Mac桌面计算机上运行分析的raxml可执行文件。
02_sister_pairs.zip
此文件包含姐妹对分析的数據和R代码。共有2个文件。'raw_data_sister_pairs.csv'是一个逗号分隔值文件,包含我们在姐妹对分析中使用的所有枝长和生命历史数据。分析中每对姐妹都有一个行,包含以下数据:姐妹对中的两个科,每个科中我们有高度数据的属的比例,每个科中的物种数量,每个科的平均高度(以毫米为单位的对数转换值),每个科的核枝长(以位点替换为单位),每个科的叶绿体dN枝长(也以位点替换为单位),每个科的叶绿体dS枝长(也以位点替换为单位),每个科的纬度(以赤道距离为单位),每个科的紫外线(以Davies等人2004年的测量方法),以及每个科的气温(开尔文)。'sister_pairs_analyses.r'包含所有用于姐妹对分析的R代码。要使用它,您需要下载R(它是免费的),安装文件顶部的相关包,然后更改顶部开始的' setwd'行,将其指向包含raw_data_sister_pairs.csv文件的计算机文件夹。在查看R代码时,它将打印出论文中使用的所有基于姐妹对分析的结果。
03_R8S
此文件包含在ML树和ML树的1000个bootstrap上运行R8S的输入文件和结果文件。共有3个文件和2个文件夹。'base_files'仅用于存放一些基本的R8S输入文件,这些文件被python脚本使用(见下文)。此文件夹包含:我们用于的r8s可执行文件(为macs编译),我们用于每个r8s分析的r8s.txt基本输入文件(其中包含我们的化石校准数据),以及包含ML树和RAxML中描述的1000个bootstrap树估计的.txt文件。'bootstrap_rates.txt'包含R8S分析的成果,每个科单独列出,每行有1001个相关列。第一列包含ML速率,后续列包含bootstrap速率(以位点替换/百万年为单位)。这是python脚本产生的主要输出文件(见下文)。'run_BS_r8s.py'是一个python脚本,它将在ML树和1000个bootstrap树上运行r8s。在运行之前,在脚本所在的同一文件夹中创建一个名为"bootstrap_results"的空目录,然后将脚本顶部的"start_dir"和"tree_file"变量更改为指向脚本所在的目录和'base_files'文件夹中的树文件。然后使用python运行脚本。简而言之,该脚本从树文件中获取每个树,创建一个r8s输入文件,运行r8s,然后解析输出以提取每个科的速率。然后,它将这些速率输出到'bootstrap_rates.txt'文件中。请注意,该脚本存储所有r8s结果,当所有分析完成时,这可能会占用大量空间(约1GB)。
04_PGLS
此文件包含5个文件,足以重新执行我们所有的PGLS分析。'bootstrap_rates.txt'包含ML r8s分析的成果以及所有1000个后续bootstrap分析。'growth_forms.csv'包含每个科中物种的生长形式信息。'PGLS_analyses.r'是一个R脚本,您可以用来重新运行我们所有的PGLS分析。要使用它,您需要更改顶部开始的' setwd'行,将其指向包含所有输入文件的计算机文件夹。您还需要下载R,以及文件顶部列出的包。'R8S_trees.txt'是一个包含1001棵树的文件。这些树用于PGLS分析以纠正非独立性。第一棵树是ML树,其余的是bootstrap树。'raw_data_sister_pairs.csv'是一个csv文件,包含原始数据。它包含在此处是为了让R脚本在没有额外麻烦的情况下运行。但它与'02_sister_pairs'部分中描述的文件相同。
提供机构:
Macquarie University



