Single Nucleotide Polymorphisms (SNPs) identified from the whole genome sequences of hilsa shad (Tenualosa ilisha) of the Bay of Bengal
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2538154
下载链接
链接失效反馈官方服务:
资源简介:
The data file contains 792,939 isolated SNPs identified by discoSnp++ v2.3.x (Uricaru et al., 2015) from the whole genome sequence of T. ilisha of the Bay of Bengal. The central sequence of length 2k-1 is seen in upper case, while the flanking sequences are seen in lower case. SNP_higher/lower: one of the two alleles. id: id of the SNP (each SNP has a unique id).
FOR SNPs:
P_i:pos_Alt1/Alt2: Information about a ith SNP (If more than a unique SNP is found, the following format is used: P_1:pos_Alt1/Alt2,P_2:pos_Alt1/Alt2,...
pos: position of the SNP with respect to the starting position of the bubble, i.e. the starting of the upper case sequence.
Alt1: One of the two alleles
Alt2: the other
FOR INDELs:
P_1:pos_size_repeatSize
pos: predicted position of the indel with respect to the starting position of the bubble, i.e. the starting of the upper case sequence.
size: predicted size of the indel
repeatSize: Size of the longest sequence both prefix of the indel and prefix of the sequence located just after the insertion.
high/low: sequence complexity. If the sequence if of low complexity (e.g. ATATATATATATATAT) this variable would be low
nb_pol: number of polymorphism.
left_unitig_length: size of the full left extension.
right_unitig_length: size of the right extension.
left_contig_length: size of the full left extension.
right_contig_length: size of the right extension.
C1: number of reads mapping the central upper case sequence from the first read set.
C2: number of reads mapping the central upper case sequence from the second read set.
Q1 [if reads were given in fastq]: average phred quality of the central nucleotide from the mapped reads from the first read set.
Q2 [if reads were given in fastq]: average phred quality of the central nucleotide from the mapped reads from the second read set.
G1: Genotype of the variant in the first read set.
G2: Genotype of the variant in the second read set.
rank: ranks the predictions according to their read coverage in each condition favoring SNPs that are discriminant between conditions.
创建时间:
2020-01-21



