7. Ecological genomics of the Northern krill: Genome-scale comparisons of adaptive divergence
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/7_Ecological_genomics_of_the_Northern_krill_Genome-scale_comparisons_of_adaptive_divergence/22817410
下载链接
链接失效反馈官方服务:
资源简介:
This item holds multiple tar archives with genome-scale comparisons of divergence between Northern krill populations, including estimated allele-frequencies and divergence (e.g. FST) , and extended haplotype signatures (XP-nSL estimates). Many analyses were performed in "chunks" (160 in total across both gene-rich and gene-poor sequences), which are described in a previous item.
Population definitions
Population definitions are the same as desribed in a different item:
"at vs. me" = Atlantic Ocean samples (n=67) vs. the Mediterranean (i.e. Barcelona) samples (n=7).
"we vs. ea" = South-West North Atlantic Ocean (n=20) vs. North-East North Atlantic Ocean (n=47). In files using this contrast, sometimes the label "wa" is used instead of "we" for the South-West North Atlantic Ocean samples.
Contents:
allele_freqs_fst.gene_rich_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-rich sequences.
allele_freqs_fst.gene_rich_sequences.we_vs_ea.tar, as above but between "we" and "ea" groups.
allele_freqs_fst.gene_poor_sequences.at_vs_me.tar, contains per-SNP estimates of allele frequencies and FST between "at" and "me" groups along gene-poor sequences.
allele_freqs_fst.gene_poor_sequences.we_vs_ea.tar, as above but for "we" and "ea" groups.
allele_freqs_fst.merged_sequences.at_vs_me.csv.gz, contains per-SNP estimates of allele frequencies and FST between "at" and "me" merged into a single TSV file.
allele_freqs_fst.merged_sequences.we_vs_ea.csv.gz, as above but for "we" and "ea".
allele_freqs_fst.gene_rich_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-rich sequences.
allele_freqs_fst.gene_rich_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups.
allele_freqs_fst.gene_poor_sequences_windows.at_vs_me.tar.gz, contains per-window estimates of FST between "at" and "me" groups along gene-poor sequences.
allele_freqs_fst.gene_poor_sequences_windows.we_vs_ea.tar.gz, as above but for "we" and "ea" groups.
selscan_xpnsl.gene_rich_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-rich sequences.
selscan_xpnsl.gene_poor_sequences.tar.gz, contains per-SNP cross-population XP-nSL statistics for gene-poor sequences.
selscan_xpnsl.gene_rich_sequences_windows.tar.gz, contains per-window cross-population XP-nSL statistics for gene-rich sequences.
selscan_xpnsl.gene_poor_sequences_windows.tar.gz, as above but for gene-poor sequences.
fst_vs_xpnsl.per_snp.at_vs_me.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "at vs. me" contrast.
fst_vs_xpnsl.per_snp.we_vs_ea.csv.gz, contains per-SNP FST, genomic region and XP-nSL values in a single file for the "we vs. ea" contrast.
fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.at_vs_me.tsv.tar.gz, integrates window-based statistics into a single file for the "at vs. me" contrast.
fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.we_vs_ea.tsv.tar.gz, as above but for the "we vs. ea" contrast.
allele_freqs_fst.gene_(rich|poor)_sequences.(at_vs_me|we_vs_ea).tar
The TSV files in these archives contain per-SNP estimates of allele frequencies and FST, along with SNP annotations. There are nine main fields/columns with overlapping/redundant information to accommodate flexible parsing. Large fields have nested subfields that are separated by "|" (first level) or ":" (second level).
name of sequence (e.g. "seq_s_1")
position of SNP (e.g. "448878")
reference allele (e.g. "A")
alternate allele (e.g. "G")
major column with FST value and allele frequency and other data for each population. It is described below.
type of SNP (e.g. intron, synonymous, missense, intergenic, ...) and label of associated gene (e.g. missense|REF_STRG_1_4_XLOC_012878)
FST tag and value (e.g. fst|0.0653)
region, type of SNP and gene label (e.g. region|missense|REF_STRG_1_4_XLOC_012878)
gene annotation derived from EnTAP annotations and Drosophila homologs, which are described below. Uses comma-separated sub-fields.
Subfields in field 5:
Example:
at/me:0.0653:148:1.0000:1.0000:1.0000|at,134,133.0000,1.0000,0.9925,0.0075|me,14,13.0000,1.0000,0.9286,0.0714
This field splits into three major subfields on "|": one about the pairwise comparison and two with metadata about each population.
1st subfield (at/me:0.0653:148:1.0000:1.0000:1.0000)
name of contrast (at/me)
FST of SNP (0.0653)
Sample size (148)
Proportion of observed data given overall sample size (1.0000), <1 if there are missing genotypes.
Proportion of observed data given sample size of population 1 (1.0000)
As above but for population 2 (1.0000)
2nd and 3rd subfields (at,134,133.0000,1.0000,0.9925,0.0075 and me,14,13.0000,1.0000,0.9286,0.0714)
name of population
sample size
number of observed reference alleles
number of observed alternate alleles
frequency of reference allele
frequency of alternate allele
Subfields in field 9:
Example: annotation|entap,XP_037775362.1 uncharacterized protein LOC119572362 [Penaeus monodon]|blast,FBgn0002526,FBtr0077014,CG10236,LanA,Laminin
annotation tag
entap annotation (comma separated sub-fields)
blast annotation (comma separated sub-fields)
These datasets are provided for each chunk and in a single merged TSV file for each contrast.
allele_freqs_fst.gene_(rich|poor)_sequences_windows.(at_vs_me|we_vs_ea).tar.gz
The TSV files in these archives contain FST estimates across 100 bp or 1,000 bp non-overlapping windows. Each TSV file has four fields:
CHROM = name of sequence
POS = window start position
N_(contrast) = number of SNPs in the window
FST_(contrast) = average Reynold's FST of the window.
selscan_xpnsl.gene_rich_sequences.tar.gz and selscan_xpnsl.gene_poor_sequences.tar.gz
The TSV files in these archives contain raw and normalized per-SNP cross-population XP-nSL output from selscan, which was used to test for selective sweeps. The format and meaning of the fields are documented with the original tool selscan: https://github.com/szpiech/selscan
selscan_xpnsl.gene_rich_sequences_windows.tar.gz and selscan_xpnsl.gene_poor_sequences_windows.tar.gz
The TSV files in these archives contain per-window average XP-nSL computed from the normalized SNP-estimates at 1,000 or 10,000 bp resolution. The TSV files have the following headers:
CHROM = name of sequence
START = start of window
STOP = stop of window
N = number of SNPs with XP-nSL estimates
N_CRIT = number of SNPs with critical XP-nSL estimates (>=2 or <=-2)
PROP_CRIT = proportion of critical SNPs
MIN = minimal XP-nSL value in window
MAX = maximal XP-nSL value in window
MEAN = mean XP-nSL value in window
fst_vs_xpnsl.per_snp.at_vs_me.csv.gz and fst_vs_xpnsl.per_snp.we_vs_ea.csv.gz
Per-SNP FST and XP-nSL data that have been merged into a single TSV file. Fields:
name of sequence
position of SNP
FST of SNP
gene region
XP-nSL
fst_vs_xpnsl_vs_diversity_vs_regions.merged_sequences.(at_vs_me|we_vs_ea).tsv.tar.gz
Merged TSV files that integrates window-based FST, XP-nSL variation genomic region data at 1,000 bp resolution. Fields in the TSV files are:
CHROM = name of sequence
START = start of window
N_at_vs_me = number of SNPs
FST_at_vs_me = average FST .
MEAN = mean XP-nSL.
LENGTH = length of window
COVERED = accessible bases
COVERED_PROP = proportion of accessible bases
all_THETA = Watterson's theta all data
all_PI = Pi all data
all_TD = Tajima's D all data
pop1_VARIABLE = polymorphic sites population 1
pop1_THETA = Watterson's theta population 1
pop1_PI = as above
pop1_TD = as above
pop2_VARIABLE = polymorphic sites population 2
pop2_THETA = Watterson's theta population 2
pop2_PI = as above
pop2_TD = as above
intergenic_COVERED = accessible sites of this region
intergenic_all_THETA = theta for this region across all data
five_prime_utr_COVERED
five_prime_utr_all_THETA
cds_COVERED cds_all_THETA
intron_COVERED
intron_all_THETA
three_prime_utr_COVERED
three_prime_utr_all_THETA
创建时间:
2024-03-28



