Data from: Widespread selection and gene flow shape the genomic landscape during a radiation of monkeyflowers
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.18j0j5q
下载链接
链接失效反馈官方服务:
资源简介:
Genome-wide data in nonoverlapping 500kb windowsThis file includes population genetic statistics, measures of genomic features, and estimates of phylogenetic concordance in 500kb nonoverlapping windows across the bush monkeyflower genome, which were used in analyses of genomic landscape evolution. Fst and dxy are included for 36 pairwise comparisons between taxa, and nucleotide diversity is included for all 9 taxa. These statistics were calculated using python scripts downloaded from https://github.com/simonhmartin/genomics_general. PC1 Fst, PC1 dxy, and PC1 nucleotide diversity for each window are obtained by performing a PCA using the 36 comparisons (fst or dxy) or 9 taxa (nucleotide diversity) as variables, and provide a summary of variation across taxa or taxon comparisons in each of these statistics. Gene count is obtained from the genome annotation, recombination rate (cM/Mb) is based on the genetic map, and tree concordance is obtained by taking the correlation coefficient between the window based tree and the whole genome ‘species tree.’500kb_win_data_nonoverlap.txtGenome-wide data in nonoverlapping 100kb windowsThis file includes population genetic statistics, measures of genomic features, and estimates of phylogenetic concordance in 100kb nonoverlapping windows across the bush monkeyflower genome, which were used in analyses of genomic landscape evolution. Fst and dxy are included for 36 pairwise comparisons between taxa, and nucleotide diversity is included for all 9 taxa. These statistics were calculated using python scripts downloaded from https://github.com/simonhmartin/genomics_general. PC1 Fst, PC1 dxy, and PC1 nucleotide diversity for each window are obtained by performing a PCA using the 36 comparisons (fst or dxy) or 9 taxa (nucleotide diversity) as variables, and provide a summary of variation across taxa or taxon comparisons in each of these statistics. Gene count is obtained from the genome annotation, recombination rate (cM/Mb) is based on the genetic map, and tree concordance is obtained by taking the correlation coefficient between the window based tree and the whole genome ‘species tree.’100kb_win_data_nonoverlap.txtGenome-wide fd statistic in 500kb windowsThis file contains estimates of admixture (fd) calculated in 500kb non-overlapping windows across the genome, for 48 different four taxon comparisons. Fd was calculated using a python script download from https://github.com/simonhmartin/genomics_general.500kb_window.fd_statistic.txtGenotypes file for genetic mapThis file is the input data file used for map construction in joinmap format (.loc) produced by the program Stacks 1.3.5 assuming a cp cross design. The first column gives the locus ID assigned to each marker by the program Stacks 1.3.5. The next column gives the segregation type code for each marker using the joinmap 4 convention. Each subsequent column provides the genotypes for an individual for all 9029 markers. Missing data is coded as “--“. The ID for each individual is given as a list in the first column underneath the last locus ID.batch_1.genotypes_250.locGenetic mapThis file contains the full genetic map used to estimate recombination rates and scaffold the genome. ‘LG’ is the linkage group identifier, which ranges from 1 – 10. The stacks_id field contains the locus ID allocated to each marker by the program Stacks 1.3.5 (Catchen et al. 2013). bp is the base-pair position of the marker within the M. aurantiacus assembly at the chromosome scale. Contig is the assembly contig (scaffold) that each marker is associated with. cM gives the sex-averaged map position estimated for each marker.Genetic_map.txtGenomic location of mapped markersThis file, which is in standard SAM format, contains the genomic position of each of the mapped markers. The ID for each locus is the ID allocated to each marker by the program Stacks 1.3.5. The sequence in the SEQ field is the consensus tag sequence for each marker, exported from Stacks 1.3.5. Mapping was performed with bowtie 2.2.6.Mapped_markers_to_genome.samTree topologies in 500kb non-overlapping windowsThis file contains ML trees estimated by RAxML using MVFtools in 500kb non-overlapping windows. These were used to calculate the estimate of tree concordance based on the correlation with the species tree topology.500kb_win_trees.txtTree topologies in 100kb non-overlapping windowsThis file contains ML trees estimated by RAxML using MVFtools in 100kb non-overlapping windows. These were used to calculate the estimate of tree concordance based on the correlation with the species tree topology.100kb_win_trees.txtGenome-wide variant callsThis file contains the genome-wide variant calls (SNPs) for all 37 individuals included in the study. Variants were called with GATK v3.8 using UnifiedGenotyper and following the best practices work flow.all_9_taxa_G1_vars.vcf.gzGenome-wide VCF including invariant sitesThis VCF file includes genotype calls for all 37 individuals included in the study at both variant and invariant sites. This file was generated using GATK v3.8 UnifiedGenotyper by including the EMIT_ALL_SITES option, and was used to more accurately estimate dxy and nucleotide diversity in genomic windows.all_9_taxa.postBQSR.all_sites.vcf.gz
创建时间:
2023-06-28



