Collection of putative promoter sequence of genes on different chromosomes of the human genome
收藏DataCite Commons2021-02-11 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Collection_of_putative_promoter_sequence_of_genes_on_different_chromosomes_of_the_human_genome/13907348/1
下载链接
链接失效反馈官方服务:
资源简介:
Promoter serve as communication link between protein-based regulatory control systems and nucleotide-based sequence determinants that regulate gene expression. As such, promoters should be relatively well characterised as stretches of DNA sequence with transcription factor binding motifs should be candidates for promoter sequence for specific genes. But, the perennial problem of defining the start and end nucleotide of promoter remains unresolved, and promoter sequences for both prokaryotic and eukaryotic systems remain poorly defined. This work took the oft-used approach of estimating promoter sequence of genes by characterising 800 base pair sequence upstream of the gene start site as putative promoter sequence for all genes in the human genome. Using an in-house MATLAB genome analysis software, the start site and gene sequence of each gene in the human genome is determined, which provides foundational knowledge for estimating putative promoter sequence of each gene in the human genome. Defined in this way, putative promoter sequence should contain extraneous elements, but, these promoter sequences remain useful for initial characterisation of the functional basis of specific promoter of particular gene such as understanding the repertoire of transcription factors that bind to each promoter, as well as their qualitative binding affinity. The collection of promoter sequence of each gene in the human genome reported herein is catalogued based on each chromosome in the karyotype of humans, and may be useful for understanding possible chromosomal specificities in the sequence of promoters in the human genome.
启动子(promoter)作为基于蛋白质的调控系统与调控基因表达的核苷酸序列决定因子之间的通信纽带。据此,启动子通常被定义为携带有转录因子结合基序(transcription factor binding motifs)的DNA序列片段,可作为特定基因的候选启动子序列。但长期以来,启动子的起始与终止核苷酸界定难题始终未能解决,原核与真核生物的启动子序列仍未得到清晰界定。本研究采用了学界通用的策略:将人类基因组中所有基因的转录起始位点上游800个碱基对的序列界定为推定启动子(putative promoter)序列,以此估算基因的启动子序列。本研究使用自研的MATLAB基因组分析软件,确定了人类基因组中每个基因的转录起始位点与基因序列,为估算人类基因组各基因的推定启动子序列提供了基础支撑依据。按此方式界定的推定启动子序列虽可能包含冗余序列,但仍可用于初步表征特定基因启动子的功能基础,例如解析结合于各启动子的转录因子谱系及其定性结合亲和力。本文所报道的人类基因组各基因启动子序列集合,按照人类核型中的各染色体进行分类编目,可用于探究人类基因组启动子序列中可能存在的染色体特异性。
提供机构:
figshare
创建时间:
2021-02-11



