Population genomics of flat-tailed horned lizards (Phrynosoma mcallii) informs conservation and management across a fragmented Colorado Desert landscape
收藏Mendeley Data2024-04-13 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.5x69p8dbj
下载链接
链接失效反馈官方服务:
资源简介:
# Population genomics of flat-tailed horned lizards (*Phrynosoma mcallii*) informs conservation and management across a fragmented Colorado Desert landscape [https://doi.org/10.5061/dryad.5x69p8dbj](https://doi.org/10.5061/dryad.5x69p8dbj) A ddRADseq dataset was collected for 45 lizards (including outgroups). Sequencing occurred on an Illumina NextSeq. FASTQ data were processed, including mapping to the *P. platyrhinos* reference genome, using iPyrad. After filtering with VCFtools, the data were analyzed with the software packages adegenet, Admixture, splitstree, IQTree2, EEMS, and moments. Inputs, outputs, intermediate files, jobscripts, and other metadata for these analyses are included in this data package. Methods are detailed in the paper. Questions are welcome, please contact the corresponding author at [gottschoa@si.edu](mailto:gottschoa@si.edu). ## Description of the data and file structure There are nine zipped files which unpack to the following directories. **adegenet.zip** includes the DAPC analysis, run with the adegenet library, in R version 4.3.1. * **adegenet.R** is the script used to run the analyses and generate plots. * **thin.recode.vcf** contains the filtered input data. * **coords.csv** contains GPS coordinates (rounded to a precision of two decimal points). **admixture.zip** contains the results of the Admixture analysis. * **script.sh** is the jobscript with the command line used to run Admixture. * Directories named **run1...run10** represent 10 replicate runs. Within each run directory, the files are structured as follows. There are eight files named **final.1.P...final.8.P,** eight files named **final.1.Q...final.8.Q**, and and eight files named **results1.txt...results8.txt** which collectively represent the output files under K=1-8. See the Admixture documentation (Alexander et al., 2009) for further details. * **admixture_CVE_results.numbers** contains cross validation errors (CVE) used to determine optimal K. * Results of run5 are plotted as **k2.png...k8.png** (under K=2-8; K=1 not plotted). Results of run5 under K=2-4 are presented in the paper (Figure 4, Figure S2). **EEMS.zip** includes the results of the estimated effective migration surfaces (EEMS) analysis. * **how2run.txt** contains instructions how to convert the input .vcf data to the .diffs format necessary for EEMS. The files from this process are found in the **vcf** directory (**thin.recode.vcf, mcallii.bed, mcallii.bim, mcallii.diffs, mcallii.fam, mcallii.log, mcallii.order**).  * The **data** directory contains the inputs to EEMS. **mcallii.coord** contains the GPS coordinates, rounded to a precision of two decimal points after the analysis. **mcallii.diffs** contains the genotype data converted as described above. **mcallii.outer** contains the outer bounds of the geographic region analyzed. * **mcallii300-chain1** is the directory containing the EEMS analysis presented in Figure 5. There are 31 text files which are beyond the scope of this readme, see Petkova et al. (2016) for extensive documentation. * A plotting script in R (**eems_plotting.R**) is provided. **iPyrad.zip** contains the results of iPyrad, the pipeline used to process raw FASTQ data to primary outputs. * **all_samples** includes the hybrids and outgroup (n=45). This dataset was used for phylogenetic and network analyses. Input parameters to the pipeline are in **params-mcallii.txt**. Output files include **mcallii_stats.txt, mcallii.nex, mcallii.phy, mcallii.snps, and mcallii.vcf**. The recoded VCF and log are also provided (**biallelic.recode.vcf, biallelic.log**). * **ingroup** excludes the hybrids and outgroup (n=42) . This dataset was used for population structure, demographic modeling and EEMS. Input parameters to the pipeline are in **params-mcallii.txt**. Output files include **mcallii_stats.txt, mcallii.nex, mcallii.phy, mcallii.snps, and mcallii.vcf**. The recoded VCF and log are also provided (**biallelic.recode.vcf, biallelic.log**). **IQtree.zip** includes the results of the rooted phylogenetic analysis. * The top level files represent the analysis that was run to use the optimal nucleotide substitution model with ModelFinder. **iqtree_MF.job** is the jobscript used to run the program and **iqtree_MF_011224.log** is the resulting log. **mcallii.phy** is the input matrix. The rest of the files (**mcallii.phy.ckp.gz, mcallii.phy.iqtree, mcallii.phy.log, mcallii.phy.model.gz, mcallii.phy.tre**, and **mcallii.phy.treefile**) are outputs or intermediate files. * The **bootstrapped** directory contains the files run with the best-fit model and bootstrap analysis, including the final tree files presented in the paper. **iqtree_bb.job** is the jobscript used to run the program and **iqtree_bb_011224.log** is the resulting log. **mcallii.phy** is the input matrix. Outputs and intermediate files include **mcallii.phy.bionj, mcallii.phy.ckp.gz, mcallii.phy.contree, mcallii.phy.iqtree, mcallii.phy.log, mcallii.phy.mldist, mcallii.phy.splits.nex**, **2024_mcallii.phy.tre**, and **mcallii.phy.treefile**. The outgroup was pruned for visualization purposes in Figure 3, see **2024_mcallii_pruned.phy.tre**. **Moments.zip** contains the inputs, outputs, and intermediate files for the 16 demographic models tested in this study. * **summary_021623.xlsx** summarizes the best-fit model and **demographic_conversions_021623.xlsx** contains the equations used to convert raw parameters to demographic values (individuals, years). * There are 13 .R, .py and .txt files (**Summarize_Outputs.py, Models_2D.py, moments_2D_00_projections.py, moments_Run_2D.py, moments_Run_Optimizations.py, Optimize_Functions.py, Optimize_Functions4plots.py, Optimizer_GOF.py, Plot_GOF.R, plot_model.py, Results_Summary_Extended.txt, Results_Summary_Short.txt, Simulate_and_Optimize.py**) that were used to run the analyses and review results, adapted from Leaché et al. (2019) and Portik et al. (2017). * There are 32 .txt files that constitute log and optimized files for 16 models. For example, **west_east.anc_asym_mig_size.log.txt** and **west_east.anc_asym_mig_size.optimized.txt** represent the files for the anc_asym_mig_size model. The 16 models are presented in Figure S1 and are adapted from Leaché et al. (2019). It is recommended to compare the list of .txt files here to Figure S1 to decode which model is which. * **west-east.sfs** represents the site frequency spectrum. **NCBI.zip** documents the BioSample accession numbers and URLs for raw FASTQ files (**BioSampleObjects.txt** and **Objects.txt**) **splitstree4.zip** includes the input (**ingroup.phy**) and output files (**pmcalli-splitstree**) from this unrooted analysis. **vcf.zip** contains instructions and logs for how to covert output files from pyRAD into the thinned / filtered versions suitable for downstream analysis. * **ingroup.vcf** is the input file. * **thin.recode.vcf** is the output file. * **how to convert to bed.sh** contains instructions on how to convert to a bed file. * **final.bed, final.bim, final.fam** represent intermediate files. * **final.log** and **thin.log** are the log files. ## Sharing/Access information Raw sequence data in FASTQ format have been deposited in NCBI (BioProject: PRJNA817579; Biosample: SAMN26796821 - SAMN26796865; SRA Accession: SRX14486053 - SRX14486097).
创建时间:
2024-04-06



