five

Genetic datasets, climatic conditions at sampled localities, and occurrence data to: Ice age-driven range shifts of diploids and expanding autotetraploids within a conserved niche (Grünig, Patsiou & Parisod, 2024, New Phytologist)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12806610
下载链接
链接失效反馈
官方服务:
资源简介:
This repository includes - An overview of the raw sequencing reads deposited in the European Nucleotide Archive (ENA) for the 370 individuals sampled in 17 diploid and 19 tetraploid field populations - Scripts used to genotype diploids and autotetraploids samples of Biscutella laevigata from ddRADseq data - Input data (as vcf format) used in population genetic analyses - Scripts used to run the different genetic analyses - Dataset of extracted climatic conditions at sampled localities - Occurrence dataset used for the climatic niche modelling Description of the data and file structure 00.ENA_samples_correspondance.txt: provides ENA project ID, run ID (i.e. raw fastq files), sample ID, and alias for each sample included in the study.   1.scripts_reads_to_vcf.zip: consists of the following: - 1.reads_to_vcf.md: md file with scripts documenting the read quality check, demultiplexing, mapping, SNP calling using GATK4, and filtering steps - Additional scripts called within 1.reads_to_vcf.md: -- 1.3. Mapping: 02_run_mapping_XXX.py and BWA-mem_bisc1_sg.py scripts -- 1.4.a. HaplotypeCaller: 03_V1_gvcf.py -- 1.4.b. GDBI + genotypeGVCF: 03_V3_gdbi_genotype_per100scaf.py 2.datasets_genetics.tar.gz consists of the following - bisc_all370_diminDP15_tetraminDP30.vcf.gz: "Initial SNPs dataset" = biallelic SNPs fulfilling GATK quality hard filtering recommendations, present in at least 50% of samples. Genotypes with DP<15 for diploids and DP<30 for tetraploids are set to no-call. This vcf was used as basis for fastsimcoal dataset preparation, and as basis for subsequent selection of loci fulfilling requirements of each analysis. It includes 2246701 biallelic SNPs for 370 samples - bisc_all370_diminDP15_tetraminDP30_MD05_pruned.vcf.gz: subset of the "Initial SNPs dataset" retaining SNPs called in at least 50% of samples, and pruned for Linkage disequilibrium. This vcf includes 107574 biallelic SNPs for 370 samples and was used in the analysis of the proportion of diploids diagnostic alleles shared by tetraploids. - bisc_all370_diminDP15_tetraminDP30_MD01_pruned.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, and pruned for Linkage disequilibrium. This vcf includes 4444 biallelic SNPs for 370 samples and was used in the analyses of Population diversity and differentiation (SpaGeDi, GenoDive, PCA), and f3-statistics. - bisc_all370_diminDP15_tetraminDP30_MD0.1_pruned_MAC3rm.vcf.gz: subset of the "Initial SNPs dataset", retaining SNPs called in at least 90% of samples, pruned for Linkage disequilibrium, and with a minor allele count of 3. This vcf includes 2593 biallelic SNPs for 370 samples and was used in STRUCTURE analysis 3.pres_2x.txt: list of the 128 diploid occurrences used in climatic niche modelling 3.pres_4x_strat_reg.txt: list of the 924 tetraploid occurrences used in climatic niche modelling biscall_chelsa_ordered_noDEM.txt: climatic data extracted from the CHELSA dataset at sampled localities 4.plot_GTfreqs.md: markdown file including scripts to plot allele and genotype frequencies   Sharing/Access information Raw sequencing reads have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under the accession number PRJEB48869: https://www.ebi.ac.uk/ena/browser/view/PRJEB48869
创建时间:
2024-07-24
二维码
社区交流群
二维码
科研交流群
商业服务