iSTOP datasets (Billon, Bryant et al, Molecular Cell, 2017)
收藏doi.org2025-03-21 收录
下载链接:
http://doi.org/10.17632/xbdtvf6bvj.1
下载链接
链接失效反馈官方服务:
资源简介:
iSTOP datasets in 8 eukaryotic species (H. sapiens, M. musculus, R. norvegicus, D. rerio, C. elegans, D. melanogaster, A. thaliana and S. cerevisiae).
Species names and genome assembly IDs are specified in each file name. Each row represents a targetable genomic coordinate within a gene. All ORFs were validated to have start and stop codons, an appropriate sequence length, and no internal stop codons. Columns are defined as follows: gene – a single gene name; chr – chromosome name; strand – the strand of the targeted base in the coding sequence; genome_coord – the genomic coordinate of the targeted base; codon – the codon targeted; n_isoforms – the number of isoforms considered for the gene; percent_isoforms – the percentage of a gene’s isoforms that are targeted at this coordinate; percent_NMD – the percentage of isoforms predicted to incur nonsense-mediated decay as predicted by targeting of an isoform’s coding sequence 55 bases upstream of the final exon-exon-junction; rel_pos_largest_isoform – the relative position in the largest isoform targeted at the genomic coordinate (0 = beginning of coding sequence, 1 = end of coding sequence); no_upstream_G – TRUE indicates there is no G in the 5’ position relative to the targeted C; RFLP_Loss – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base before editing; RFLP_Gain – enzymes that uniquely cut +/- 50 bases of genomic sequence from targeted base after editing; sgNGG, sgNGA, sgNGCG, sgNGAG, sgNNGRRT, sgNNNRRT – 20 bp guide sequence for corresponding PAM (targeted C is lowercase); sgNGG_off_targets, sgNGA_off_targets, sgNGCG_off_targets, sgNGAG_off_targets, sgNNGRRT_off_targets, sgNNNRRT_off_targets – Number of off-target locations in the genome determined by searching for matching sequence with up to two mismatches allowed in the first 8 bases of the guide sequence.
iSTOP 数据集涵盖8种真核生物(智人 H. sapiens、小鼠 M. musculus、挪威鼠 R. norvegicus、斑马鱼 D. rerio、秀丽线虫 C. elegans、黑腹果蝇 D. melanogaster、拟南芥 A. thaliana 和啤酒酵母 S. cerevisiae)。每种物种的名称和基因组组装ID均明确标注于文件名中。每一行代表基因内的一个可靶向基因组坐标。所有开放阅读框(ORF)均经过验证,确保具有起始和终止密码子、合适的序列长度以及无内部终止密码子。列定义如下:基因 - 单个基因名称;chr - 染色体名称;strand - 目标碱基在编码序列中的链;genome_coord - 目标碱基的基因组坐标;codon - 目标密码子;n_isoforms - 考虑到的基因异构体数量;percent_isoforms - 在此坐标点上被靶向的基因异构体占基因异构体总数的百分比;percent_NMD - 预测因靶向编码序列上游55个碱基处的异构体而遭受无义介导的降解(NMD)的异构体百分比;rel_pos_largest_isoform - 在基因组坐标点上被靶向的最大异构体的相对位置(0 = 编码序列开始,1 = 编码序列结束);no_upstream_G - TRUE 表示相对于目标 C 的 5' 位置不存在 G;RFLP_Loss - 在编辑前,从目标碱基处切割基因组序列 ±50 个碱基的酶;RFLP_Gain - 在编辑后,从目标碱基处切割基因组序列 ±50 个碱基的酶;sgNGG、sgNGA、sgNGCG、sgNGAG、sgNNGRRT、sgNNNRRT - 对应 PAM 的 20 bp 导向序列;sgNGG_off_targets、sgNGA_off_targets、sgNGCG_off_targets、sgNGAG_off_targets、sgNNGRRT_off_targets、sgNNNRRT_off_targets – 通过搜索匹配序列并允许引导序列前8个碱基最多存在两个错配来确定基因组中的脱靶位点数量。
提供机构:
Mendeley Data



