RNA binding strengths of centromeres and the flanking sequences on 19 and X chromosomes
收藏科学数据银行2021-06-26 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/en/detail?dataSetId=63dfdc9f3fe24566aa15e47b458a4f7f
下载链接
链接失效反馈官方服务:
资源简介:
The centromere regions of human chromosome are constitutive heterochromatin that is almost non-transcrited. In human cells, interactions between RNA and DNA can activate gene expression in a process known as RNA activation. To describe the possibility of all RNAs binding to DNA sequences, we developed a metric we call RNA binding strength. Here we analyzed the RNA binding strengths of the centromere regions of chromosomes 19 and X and their upstream and downstream flanking sequences using bioinformatics. We found that the RNA binding strengths of the centromere regions were significantly lower than those of the corresponding flanking sequences. We concluded that low RNA binding strength of the human centromere regions may contribute to centromere’s characteristic of lacking transcription.Sequence data and softwareThe nucleotide sequences of the centromere regions and the flanking sequences of human chromosomes X and 19 were obtained from NCBI (GRCh38 Primary Assembly HSCHR6 CTG1, http://www.ncbi.nlm.nih.gov/ projects/ genome/ guide/human).). A total of 1,000 genes highly expressed in human tonsil germinal center B cells were selected for analysis based on the results of the Digital Differential Display (NCBI UniGene Lib.289 -NCI_CGAP_GCB1). Gene-Analyser 2.0 software (see Gene-Analyser 2.0) was used to analyze the number of 7-nucleotide (7-nt) strings (47=16,384), which was written by our team. The 50-kb DNA sequence was represented by a long column of numbers whose sum was 49,994. Microsoft Excel software 2003 was used for statistical analysis.The algorithm for the RNA binding strengthThe RNA binding strength algorithm is based on the principle that more complementarity between RNA and DNA results in more binding between RNAs and DNAs. For example, when there is one 5'-TTTTTTT DNA molecule and ten 5'-AAAAAAA RNA molecules in a certain volume solution, the likelihood of DNA binding with RNAs is 10 (10×1=10). If there are ten 5'-TTTTTTT DNA molecules, the likelihood of DNA binding with RNAs is 100 (10×10=100). The binding of single-strand RNA and double-strand DNA accounts for competition between RNA and DNA for binding.The centromere of chromosome 19 (chr19) is located at 24.50Mb-27.19Mb. DNA sequence from 19Mb to 32Mb of chromosome 19 was divided into different 50kb fragments. Gene-Analyser 2.0 software was used to analyze the 7nt strings contained in each 50kb fragment. The results of analysis was shown in folder of Chr19 sequence and 7 nt strings. Folder, named “19”, contains the DNA base sequences and their 7nt strings from 19000001 bp to 20000000 bp of chr19 and contains 20 text files and 20 Excel files. Folders 20-32 represent the same meaning as folder “19”.The centromere of chromosome X (chrX) is located at 58.61Mb-62.41Mb. DNA sequence from 52Mb to 67Mb of chrX was divided into different 50kb fragments. Gene-Analyzer 2.0 software was used to analyze the 7nt strings contained in each 50kb fragment.The results of analysis was shown in folder of ChrX sequence and 7 nt strings. Folder 52 contains the DNA base sequences and their 7nt strings from 52000001 bp to 53000000 bp of chrX and contains 20 text files and 20 Excel files. Folders 53-67 represent the same meaning as folder 52.The total RNA of 1,000 genes highly expressed in tonsil germinal center B cells is shown in Excel file, named “7nt strings of RNA expressed by 1000 genes”.7nt string data of each 50kb fragement is multiplied by the total RNA of same strings, and the sum of 16,384 products is the RNA binding strength of the 50 kb fragement.The 50 kb fragment size was chosen because a transcription unit contains 10-50kb sequences (based on DNase I digestion of the hemoglobin and ovoalbumin genes). One-thousand genes highly expressed in human tonsil germinal center B cells were selected (as described above), and the 7-nt string numbers for these genes were calculated from the sense strand (including introns and exons). The 7-nt string numbers for each gene multiply by the expression frequency of the gene (Lib.5601; http://www.ncbi.nlm.nih.gov/UniGene/), which results in the calculated numbers of the 7-nt string for the gene (see excel file, named “ChrX52 RNA binding strength”, shows how to calculate the RNA binding strength value of one 50-kb DNA fragment (from 52000001 bp to 53000000 bp of chrX ).Excel file, named “Chr19”, shows the result of RNA binding strengths of centromeres and the flanking sequences of Chr19. The Excel file, named “ChrX” , shows the value of RNA binding strengths of centromeres and the flanking sequences of chrX.
人类染色体的着丝粒(centromere)区域属于组成型异染色质(constitutive heterochromatin),几乎不发生转录。在人类细胞中,RNA与DNA之间的相互作用可通过一种被称为RNA激活(RNA activation)的过程激活基因表达。为了描述所有RNA与DNA序列结合的可能性,我们开发了一项名为RNA结合强度(RNA binding strength)的量化指标。本研究通过生物信息学(bioinformatics)手段,分析了19号和X号染色体着丝粒区域及其上下游侧翼序列的RNA结合强度。研究发现,着丝粒区域的RNA结合强度显著低于其对应的侧翼序列。据此我们认为,人类着丝粒区域较低的RNA结合强度,可能是其缺乏转录活性的原因之一。
序列数据与软件
人类X号和19号染色体着丝粒区域及侧翼序列的核苷酸序列均获取自NCBI(GRCh38 Primary Assembly HSCHR6 CTG1,http://www.ncbi.nlm.nih.gov/projects/genome/guide/human)。基于数字差异显示(Digital Differential Display)结果(NCBI UniGene Lib.289 -NCI_CGAP_GCB1),我们筛选出在人类扁桃体生发中心B细胞中高表达的1000个基因用于后续分析。本研究使用自研的Gene-Analyser 2.0软件,对7核苷酸(7-nucleotide, 7-nt)序列串(总共有4^7=16384种)的数量进行统计。我们将50kb的DNA序列表示为一组总和为49994的长数字序列,并使用Microsoft Excel 2003软件完成统计学分析。
RNA结合强度的计算算法
RNA结合强度算法的核心原理为:RNA与DNA的互补性越强,二者的结合概率越高。举例而言,若某一体积的溶液中存在1条5'-TTTTTTT的DNA分子与10条5'-AAAAAAA的RNA分子,则DNA与RNA结合的概率为10(10×1=10);若溶液中存在10条5'-TTTTTTT的DNA分子,则结合概率为100(10×10=100)。单链RNA(single-strand RNA)与双链DNA(double-strand DNA)的结合过程,会受到RNA与DNA之间结合竞争的影响。
19号染色体(chr19)的着丝粒位于24.50Mb-27.19Mb区域。我们将19号染色体19Mb至32Mb区间的DNA序列划分为若干50kb的片段,并使用Gene-Analyser 2.0软件对每个50kb片段所包含的7nt序列串进行分析。分析结果存放于Chr19序列与7nt序列串文件夹中。名为“19”的文件夹包含了chr19从19000001bp至20000000bp区间的DNA碱基序列及其7nt序列串,内含20个文本文件与20个Excel文件。20至32号文件夹的含义与“19”文件夹一致。
X号染色体(chrX)的着丝粒位于58.61Mb-62.41Mb区域。我们将chrX染色体52Mb至67Mb区间的DNA序列划分为若干50kb的片段,并使用Gene-Analyser 2.0软件对每个50kb片段所包含的7nt序列串进行分析。分析结果存放于ChrX序列与7nt序列串文件夹中。名为“52”的文件夹包含了chrX从52000001bp至53000000bp区间的DNA碱基序列及其7nt序列串,内含20个文本文件与20个Excel文件。53至67号文件夹的含义与“52”文件夹一致。
在人类扁桃体生发中心B细胞中高表达的1000个基因的总RNA数据存放于名为“1000个基因表达的7nt序列串”的Excel文件中。将每个50kb片段的7nt序列串数据与对应序列串的总RNA丰度相乘,16384种7nt序列串的乘积之和即为该50kb片段的RNA结合强度。
本研究选取50kb作为片段长度的依据为:一个转录单元通常包含10-50kb的序列(基于血红蛋白与卵清蛋白基因的DNase I消化实验结果)。如前所述,我们筛选出1000个在人类扁桃体生发中心B细胞中高表达的基因,并从正义链(包含内含子与外显子)出发计算这些基因的7nt序列串数量。将每个基因的7nt序列串数量乘以该基因的表达频率(Lib.5601;http://www.ncbi.nlm.nih.gov/UniGene/),即可得到该基因的7nt序列串计算值(详见名为“ChrX52 RNA结合强度”的Excel文件,该文件展示了如何计算chrX上52000001bp至53000000bp区间的一个50kb DNA片段的RNA结合强度值)。
名为“Chr19”的Excel文件展示了chr19着丝粒区域及其侧翼序列的RNA结合强度计算结果;名为“ChrX”的Excel文件展示了chrX着丝粒区域及其侧翼序列的RNA结合强度计算结果。
提供机构:
Suleman Shah; Zhixue Song; Zhanjun Lv; Xiaodie Wang; Xiufang Wang; Hebei Medical University; Baixue Lv; Peiyuan Wu; Xiaocui Duan
创建时间:
2021-06-24



