Data from: Genomic landscapes of divergence among island bird populations: evidence of parallel adaptation but at different loci?
收藏Mendeley Data2024-04-26 更新2024-06-29 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.1g1jwsv4b
下载链接
链接失效反馈官方服务:
资源简介:
# Genomic landscapes of divergence among island bird populations: evidence of parallel adaptation but at different loci? Claudia A. Martin, Eleanor C. Sheppard, Hisham Ali, Juan Carlos Illera, Alexander Suh, Lewis G. Spurgin and David S. Richardson [https://doi.org/10.5061/dryad.1g1jwsv4b](https://doi.org/10.5061/dryad.1g1jwsv4b) For any further queries please contact [Cmarti3@ed.ac.uk](mailto:Cmarti3@ed.ac.uk) /claudia.martin@uea.ac.uk ## Data #### Data obtained from published datasets Original RAW reads for the Berthelot's pipit draft genome are not supplied here as they are already available through previously published data by Armstrong et al. 2018 under [https://doi.org/10.5061/dryad.9642b](https://doi.org/10.5061/dryad.9642b) These files include "Anthus_berthelotii_PS_816_genome.zip" This zip file contains a BLAST database of the draft Berthelot's pipit genome as described in the Supplementary Methods of Armstrong et al. 2018. This genome was sequenced from sample 816 from Porto Santo. The assembled draft reference genome is provided here also `a.lines.fasta` and is the only Berthelot's pipit reference file required in this current publication. Variant Call Files (VCF) were also generated and described in greater detail by a previously published study by our group. Details of the raw VCF datasets used here can be found at Martin, Claudia et al. (2023). Runs of homozygosity reveal past bottlenecks and contemporary inbreeding across diverging populations of an island-colonizing bird [Dataset]. Dryad. [https://doi.org/10.5061/dryad.ksn02v75k](https://doi.org/10.5061/dryad.ksn02v75k) This includes the following relevant data for this study: **1) All Pipits, Berthelots, and Tawny VCF files** These are the three datasets in variant call format as referred to in the manuscript. **2) Chromosome codes** `Genome_chromosome_codes.txt` file (also provided here) contains Zebra finch (*Taeniopygia guttata*) chromosome names, and their equivalent numeric codes used in the VCF files. These are the calibrated genome locations for Berthelot's pipit contigs. **3) Creating VCF datasets** The first steps are detailed in example_gVCF.sh, which contains the pipeline used to generate the GATK haplotype called gVCF file on an individual-by-individual basis (for further details see gvcf_pipeline_description.txt). Second individual gVCF files are converted to joint genotype called VCF datasets, and contig locations are mapped to Zebra finch chromosomes using SatsumaSynteny. This pipeline is detailed in gvcf_to_vcf.txt. Third, Satsuma_output_vcf.R is used on the output files from SatsumaSynteny to assign contigs to chromosomes, and determine their order, location and orientation. Finally, the three VCF datasets detailed in this manuscript are created and filtered. #### Data included in this dataset Here we provide the following additional data. The following files, as required to run the analyses detailed in this paper, are provided here: **1) `SCRIPTS.sh` file** Code used to undertake anlayses outlined in this paper. *NOTE! These scripts MUST be run prior to R scripts to produce outputs. **2) R scripts for plotting and further analyses** These .R files detail R scripts needed to produce output figures and statistics detailed in the manuscript. These files are separated by outputs in the different sections of the manuscript and refer specifically to figures used in the manuscript where relevant. `FST_correlation_plots.R` details the code required to produce FST histograms across the population comparisons and correlation plots of these; `Divergence_Peaks.R` details the code required to produce Manhattan plots across chromosomes for each of the divergence comparisons and follows this up with zoomed regions of strong divergence using Tajima's D and Pi across genomic windows. The location of genes within these regions is also plotted. Finally, `Modelling_simulations.R` details the code to run and plot the individual-based modelling and plot produced in the paper. ## Code/Software This study used the following packages: * GATK - for variant filtering. * Bash - command line running of scripts and file manipulation * R including packages ggplot2, tidyverse etc. - data presentation and plotting. Glads package - chromosome simulations. * VCFtools - FST, Pi, Tajima's D, variant statistics. * Plink 1.9 - PCA, filtering.
创建时间:
2024-04-22



