five

Single Nucleotide Polymorphisms (SNPs) identified from the whole genome sequences of hilsa shad (Tenualosa ilisha) of the Bay of Bengal

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/2538154
下载链接
链接失效反馈
官方服务:
资源简介:
The data file contains 792,939 isolated SNPs identified by discoSnp++ v2.3.x (Uricaru et al., 2015) from the whole genome sequence of T. ilisha of the Bay of Bengal. The central sequence of length 2k-1 is seen in upper case, while the flanking sequences are seen in lower case. SNP_higher/lower: one of the two alleles. id: id of the SNP (each SNP has a unique id). FOR SNPs: P_i:pos_Alt1/Alt2: Information about a ith SNP (If more than a unique SNP is found, the following format is used: P_1:pos_Alt1/Alt2,P_2:pos_Alt1/Alt2,... pos: position of the SNP with respect to the starting position of the bubble, i.e. the starting of the upper case sequence. Alt1: One of the two alleles Alt2: the other FOR INDELs: P_1:pos_size_repeatSize pos: predicted position of the indel with respect to the starting position of the bubble, i.e. the starting of the upper case sequence. size: predicted size of the indel repeatSize: Size of the longest sequence both prefix of the indel and prefix of the sequence located just after the insertion. high/low: sequence complexity. If the sequence if of low complexity (e.g. ATATATATATATATAT) this variable would be low nb_pol: number of polymorphism. left_unitig_length: size of the full left extension. right_unitig_length: size of the right extension. left_contig_length: size of the full left extension. right_contig_length: size of the right extension. C1: number of reads mapping the central upper case sequence from the first read set. C2: number of reads mapping the central upper case sequence from the second read set. Q1 [if reads were given in fastq]: average phred quality of the central nucleotide from the mapped reads from the first read set. Q2 [if reads were given in fastq]: average phred quality of the central nucleotide from the mapped reads from the second read set. G1: Genotype of the variant in the first read set. G2: Genotype of the variant in the second read set. rank: ranks the predictions according to their read coverage in each condition favoring SNPs that are discriminant between conditions.
创建时间:
2020-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作