five

Inferring the evolutionary model of community-structuring traits with convolutional kitchen sinks: Code and data

收藏
Mendeley Data2024-05-12 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.zw3r2289q
下载链接
链接失效反馈
官方服务:
资源简介:
# Inferring the Evolutionary Model of Community-Structuring Traits with Convolutional Kitchen Sinks: code and data --- This README file was generated on 2024-05-03 by Avery Michael Kruger. GENERAL INFORMATION Title of Dataset: Inferring the Evolutionary Model of Community-Structuring Traits with Convolutional Kitchen Sinks: code and data Corresponding Investigator Name: Avery Michael Kruger Institution: University of British Columbia Email: [avery.kruger@botany.ubc.ca](mailto:avery.kruger@botany.ubc.ca) Co-investigator Name: Vaishaal Shankar Institution: Apple Inc. Former Institution: Amazon.com, Inc. Co-investigator Name: Jonathan Davies Institution: University of British Columbia SHARING/ACCESS INFORMATION Data was derived from the following sources: * Zanne, Amy E. et al. (2014), Data from: Three keys to the radiation of angiosperms into freezing environments, Dryad, Dataset, [https://doi.org/10.5061/dryad.63q27](https://doi.org/10.5061/dryad.63q27) Recommended citation for this dataset: Kruger, Avery; Shankar, Vaishaal; Davies, Jonathan (2024), Inferring the evolutionary model of community-structuring traits with convolutional kitchen sinks: Code and data, Dryad, Dataset, [https://doi.org/10.5061/dryad.zw3r2289q](https://doi.org/10.5061/dryad.zw3r2289q) DATA & FILE OVERVIEW Description: These files include Supplementary Figures for Kruger et al., as well as code and archived simulation data necessary to replicate the figures and results of Kruger et al. These analyses were performed to investigate the ability of two methods, a machine learning technique termed Convolutional Kitchen Sinks (CKS) and models trained on series of Mean Pairwise Distance (MPD) metrics, also termed MPD curve, to recover the evolutionary model of traits that communities are assembled on. Communities were simulated on both simulated and empirical phylogenies by evolving traits on the phylogenies according to an Early Burst transformation governed by a normally distributed evolutionary parameter. Data were separated into training and test data. The evolutionary parameters used in simulation were then modeled as a function of the observed simulated communities in the training data, using both the CKS method and MPD method. The models were then tested by examining the relationship between the predicted and known parameters of the test data. Finally, predictions using trained models were made on the known community of freeze-tolerant dicots. File List: ``` File 1 Name: SupplementalFigures.pdf File 1 Description: This file contains Supplementary Figures S1-S7. File 2 Name: source_functions.R File 2 Description: This code contains functions called in various scripts. It is called by scripts as needed, so it is not necessary to run this code independently. File 3 Name: 1_setup.R File 3 Description: This code creates folders `data`, `data/angiosim`, `plots`, and `output` in the working directory if they do not already exist, and moves angiocomm.rds and angiompd.rds into the data/angiosim folder. This file also contains code to install PhyloMeasures 2.1, an archived package on CRAN, which contains a necessary function, and code to install kitchen (avery-kruger/kitchen) from GitHub . File 4 Name: 2_community_sims.R File 4 Description: This code simulates phylogenies and communities for later analysis. Writes to `data` folder and creates it if it does not already exist. File 5 Name: 3_community_cks.R File 5 Description: This code tests how well CKS and MPD curve methods perform at predicting parameters used to evolve traits that communities are assembled upon. This file produces Figure 3 of the manuscript. File 5 Dependency: 2_community_sims.R must be run first to generate data. File 6 Name: 4_dicottree_trim.R File 6 Description: This code takes a phylogeny from Zanne et al. 2014 and trims it to contain only species in Magnoliopsida for which freezing data in MinimumFreezingExposure.csv is present. File 6 Dependency 1: Requires Vascular_Plants_rooted.dated.tre from Zanne et al. 2014 File 6 Dependency 2: Requires MinimumFreezingExposure.csv from Zanne et al. 2014 File 7 Name: 5_dicot_sims.R File 7 Description: This code simulates communities and MPD curves on a dicot phylogeny. File 7 Dependency: Requires Zanne.angiosperm.tre, which is generated by 4_dicottree_trim.R. File 8 Name: 6_dicot_cks.R File 8 Description: This code trains CKS and MPD models on communities simulated on the Zanne phylogeny and checks predictions against known values. It then predicts an evolutionary parameter given the known community of freeze-tolerant plants. This file creates Figures 7, 8, and 9. File 8 Dependency: Requires angiocomm.rds and angiompd.rds files in the data/angiosim folder. These already exist, but may be generated by running 5_dicot_sims.R. File 9 Name: fig4_s2_s3_computetime.R File 9 Description: This code runs kitchen_sweep across communities and superparameters of different sizes to demonstrate the relationship between those dimensions and computational time. Before running, either run RandTrees_simulations.R to generate data computetime.R uses or alter code to use other data. Plots are presented in Figure 4 and Supplementary Figures S2 and S3. File 9 Dependency: Requires files beginning with 1024treesim_comm that are generated by running 2_community_sims.R. File 10 Name: fig5_altassembly_varytraitcov.R File 10 Description: This code examines how covariance of traits affects inference. Two plots are presented in Figure 5. File 11 Name: fig6_s5_altassembly_diffalphas.R File 11 Description: This code examines how different numbers of independent alphas affect performance of CKS. Plots are presented in Figure 6 and Supplementary Figure S5. File 12 Name: s1_altassembly_ntraits.R File 12 Description: This code explores how the number of traits affects inference. Plot is presented in Supplementary Figure S1. File 13 Name: s4_altprediction_delta.R File 13 Description: This code compares the performance of MPD curves generated on Early Burst- (EB) and delta-transformed phylogenies. Plot is presented in Supplementary Figure S4. File 14 Name: s6_s7_altassembly_limsim.R File 14 Description: This code examines the performance of CKS and MPD methods on communities assembled under limiting similarity. Plots are used in Supplementary Figures S6-S7. File 15 Name: x_altassembly_varycommsize.R File 15 Description: This code explores how community size affects performance of CKS. Data were not presented in the manuscript or Supplementary Figures. File 16 Name: angiocomm.rds File 16 Description: This RDS file contains the original simulations on the empirical phylogeny that were used in the manuscript. These data are used for training and testing a CKS model that predicts the evolutionary parameter describing evolution of traits the communities were assembled on. These data were originally generated with 5_dicot_sims.R. File 17 Name: angiompd.rds File 17 Description: This file contains a data frame of the MPD curve of the corresponding rows in angiocomm.rds. The MPD curve is a series of Mean Pairwise Distance metrics calculated a cross a series of transformations of the dicot phylogeny. These data are used to train and test a linear model that predicts the evolutionary parameter describing evolution of traits the communities were assembled on. These data were originally generated with 5_dicot_sims.R. ``` METHODOLOGICAL INFORMATION Methods for processing the data: 1_setup.R should be run first to ensure folders are set up properly in the working directory and to install PhyloMeasures and kitchen. After that, 2\_ and 3\_ may be run together or 4_-6\_ may be run together, as described below. 2_community_sims.R and 3_community_cks.R should be run sequentially. 4\_ through 6\_ require Vascular_Plants_rooted.dated.tre and MinimumFreezingExposure.csv, files that are available from Zanne et al. (2014), as described in Sharing/Access information. 4_dicottree_trim.R should be run first to create the appropriate phylogeny. 5_dicot_sims.R may be optionally run to simulate data. Files angiocomm.rds and angiompd.rds were generated with this code 6_dicot_cks.R can then be run to perform analyses. Files 9-14 were used to produce figures for Sensitivity of CKS to Alternative Models and Supplemental Figures. These scripts are all independent of each other and do not need to be run in any particular order. angiocomm.rds was created with 5_dicot_sims.R. For each simulation, traits were evolved on the dicot phylogeny by rescaling the phylogeny with an Early Burst transformation using a normally distributed parameter with mean 0 and standard deviation 0.08. A random freezing-exposed species was chosen as an optimum, and then the 4,353 species closest in Euclidean trait space were selected as present in a community and assigned the value 1. angiocomm.mpds was created with 5_dicot_sims.R. For each row in angiocomm.rds, the mean pairwise distance of the community was calculated on a series of delta transformations of the dicot phylogeny transformed by parameters ranging from 0.05 to 40. Instrument- or software-specific information needed to interpret the data: CRAN Packages: ape 5.6-2 geiger 2.0.10 ggplot2 3.4.0 parallel 4.2.2 phytools 1.2-0 Packages not available on CRAN (see File 3 Description): kitchen 0.1.0 PhyloMeasures 2.1 People involved with analysis: Avery Kruger DATA-SPECIFIC INFORMATION FOR: angiocomm.rds Number of variables: 9,850 Number of cases/rows: 5,000 Element List: param: Evolutionary parameter used for evolution of traits comm: Data frame of simulated communities. Contains 9,849 species binomial variables, which represent the presence or absence of a dicot species in each simulated community. Missing data codes: None DATA-SPECIFIC INFORMATION FOR: angiompd.rds Number of variables: 99 Number of cases/rows: 5,000 Variable List: param: Evolutionary parameter used for evolution of traits 98 numeric variables 0.05, 0.1, ..., 40: Each numerically named variable represents the mean pairwise distance of each simulated community, calculated on a phylogeny transformed by a delta transformation where delta equals the variable name. Missing data codes: None
创建时间:
2023-07-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作