7. Ecological genomics of the Northern krill: Genome-scale comparisons of adaptive divergence

Name: 7. Ecological genomics of the Northern krill: Genome-scale comparisons of adaptive divergence
Creator: Uppsala University
Published: 2025-01-15 14:52:35
License: 暂无描述

DataCite Commons2025-01-15 更新2024-07-13 收录

下载链接：

https://figshare.scilifelab.se/articles/dataset/7_Ecological_genomics_of_the_Northern_krill_Genome-scale_comparisons_of_adaptive_divergence/22817410

下载链接

链接失效反馈

官方服务：

资源简介：

This item holds multiple tar archives with genome-scale comparisons of divergence between Northern krill populations, including estimated allele-frequencies and divergence (e.g. FST) , and extended haplotype signatures (XP-nSL estimates). Many analyses were performed in "chunks" (160 in total across both gene-rich and gene-poor sequences), which are described in a previous item. Population definitions Population definitions are the same as desribed in a different item: "at vs. me" = Atlantic Ocean samples (n=67) vs. the Mediterranean (i.e. Barcelona) samples (n=7). "we vs. ea" = South-West North Atlantic Ocean (n=20) vs. North-East North Atlantic Ocean (n=47). In files using this contrast, sometimes the label "wa" is used instead of "we" for the South-West North Atlantic Ocean samples. Contents: allele_freqs_fst.gene_rich_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-rich sequences. allele_freqs_fst.gene_rich_sequences.we_vs_ea.tar, as above but between "we" and "ea" groups. allele_freqs_fst.gene_poor_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-poor sequences. allele_freqs_fst.gene_poor_sequences.we_vs_ea.tar, as above but for "we" and "ea" groups. allele_freqs_fst.merged_sequences.at_vs_me.csv.gz, contains per-SNP estimates of allele frequencies and FST between "at" and "me" merged into a single TSV file. allele_freqs_fst.merged_sequences.we_vs_ea.csv.gz, as above but for "we" and "ea". allele_freqs_fst.gene_rich_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-rich sequences. allele_freqs_fst.gene_rich_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups. allele_freqs_fst.gene_poor_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-poor sequences. allele_freqs_fst.gene_poor_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups. selscan_xpnsl.gene_rich_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-rich sequences. selscan_xpnsl.gene_poor_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-poor sequences. selscan_xpnsl.gene_rich_sequences_windows.tar.gz, contains per-window cross-population XP-nSL statistics for gene-rich sequences. selscan_xpnsl.gene_poor_sequences_windows.tar.gz, as above but for gene-poor sequences. fst_vs_xpnsl.per_snp.at_vs_me.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "at vs. me" contrast. fst_vs_xpnsl.per_snp.we_vs_ea.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "we vs. ea" contrast. fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.at_vs_me.tsv.tar.gz, integrates window-based statistics into a single file for the "at vs. me" contrast. fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.we_vs_ea.tsv.tar.gz, as above but for the "we vs. ea" contrast. allele_freqs_fst.gene_(rich|poor)_sequences.(at_vs_me|we_vs_ea).tar The TSV files in these archives contain per-SNP estimates of allele frequencies and FST, along with SNP annotations. There are nine main fields/columns with overlapping/redundant information to accommodate flexible parsing. Large fields have nested subfields that are separated by "|" (first level) or ":" (second level). name of sequence (e.g. "seq_s_1") position of SNP (e.g. "448878") reference allele (e.g. "A") alternate allele (e.g. "G") major column with FST value and allele frequency and other data for each population. It is described below. type of SNP (e.g. intron, synonymous, missense, intergenic, ...) and label of associated gene (e.g. missense|REF_STRG_1_4_XLOC_012878) FST tag and value (e.g. fst|0.0653) region, type of SNP and gene label (e.g. region|missense|REF_STRG_1_4_XLOC_012878) gene annotation derived from EnTAP annotations and Drosophila homologs, which are described below. Uses comma-separated sub-fields. Subfields in field 5: Example: at/me:0.0653:148:1.0000:1.0000:1.0000|at,134,133.0000,1.0000,0.9925,0.0075|me,14,13.0000,1.0000,0.9286,0.0714 This field splits into three major subfields on "|": one about the pairwise comparison and two with metadata about each population. 1st subfield (at/me:0.0653:148:1.0000:1.0000:1.0000) name of contrast (at/me) FST of SNP (0.0653) Sample size (148) Proportion of observed data given overall sample size (1.0000), <1 if there are missing genotypes. Proportion of observed data given sample size of population 1 (1.0000) As above but for population 2 (1.0000) 2nd and 3rd subfields (at,134,133.0000,1.0000,0.9925,0.0075 and me,14,13.0000,1.0000,0.9286,0.0714) name of population sample size number of observed reference alleles number of observed alternate alleles frequency of reference allele frequency of alternate allele Subfields in field 9: Example: annotation|entap,XP_037775362.1 uncharacterized protein LOC119572362 [Penaeus monodon]|blast,FBgn0002526,FBtr0077014,CG10236,LanA,Laminin annotation tag entap annotation (comma separated sub-fields) blast annotation (comma separated sub-fields) These datasets are provided for each chunk and in a single merged TSV file for each contrast. allele_freqs_fst.gene_(rich|poor)_sequences_windows.(at_vs_me|we_vs_ea).tar.gz The TSV files in these archives contain FST estimates across 100 bp or 1,000 bp non-overlapping windows. Each TSV file has four fields: CHROM = name of sequence POS = window start position N_(contrast) = number of SNPs in the window FST_(contrast) = average Reynold's FST of the window. selscan_xpnsl.gene_rich_sequences.tar.gz and selscan_xpnsl.gene_poor_sequences.tar.gz The TSV files in these archives contain raw and normalized per-SNP cross-population XP-nSL output from selscan, which was used to test for selective sweeps. The format and meaning of the fields are documented with the original tool selscan: https://github.com/szpiech/selscan selscan_xpnsl.gene_rich_sequences_windows.tar.gz and selscan_xpnsl.gene_poor_sequences_windows.tar.gz The TSV files in these archives contain per-window average XP-nSL computed from the normalized SNP-estimates at 1,000 or 10,000 bp resolution. The TSV files have the following headers: CHROM = name of sequence START = start of window STOP = stop of window N = number of SNPs with XP-nSL estimates N_CRIT = number of SNPs with critical XP-nSL estimates (>=2 or <=-2) PROP_CRIT = proportion of critical SNPs MIN = minimal XP-nSL value in window MAX = maximal XP-nSL value in window MEAN = mean XP-nSL value in window fst_vs_xpnsl.per_snp.at_vs_me.csv.gz and fst_vs_xpnsl.per_snp.we_vs_ea.csv.gz Per-SNP FST and XP-nSL data that have been merged into a single TSV file. Fields: name of sequence position of SNP FST of SNP gene region XP-nSL fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.(at_vs_me|we_vs_ea).tsv.tar.gz Merged TSV files that integrates window-based FST, XP-nSL variation genomic region data at 1,000 bp resolution. Fields in the TSV files are: CHROM = name of sequence START = start of window N_at_vs_me = number of SNPs FST_at_vs_me = average FST . MEAN = mean XP-nSL. LENGTH = length of window COVERED = accessible bases COVERED_PROP = proportion of accessible bases all_THETA = Watterson's theta all data all_PI = Pi all data all_TD = Tajima's D all data pop1_VARIABLE = polymorphic sites population 1 pop1_THETA = Watterson's theta population 1 pop1_PI = as above pop1_TD = as above pop2_VARIABLE = polymorphic sites population 2 pop2_THETA = Watterson's theta population 2 pop2_PI = as above pop2_TD = as above intergenic_COVERED = accessible sites of this region intergenic_all_THETA = theta for this region across all data five_prime_utr_COVERED five_prime_utr_all_THETA cds_COVERED cds_all_THETA intron_COVERED intron_all_THETA three_prime_utr_COVERED three_prime_utr_all_THETA

提供机构：

Uppsala University

创建时间：

2024-03-06