Table6_Transcriptome-Wide Annotation of m5C RNA Modifications Using Machine Learning.XLSX

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Table6_Transcriptome-Wide_Annotation_of_m5C_RNA_Modifications_Using_Machine_Learning_XLSX/7411493

下载链接

链接失效反馈

官方服务：

资源简介：

The emergence of epitranscriptome opened a new chapter in gene regulation. 5-methylcytosine (m5C), as an important post-transcriptional modification, has been identified to be involved in a variety of biological processes such as subcellular localization and translational fidelity. Though high-throughput experimental technologies have been developed and applied to profile m5C modifications under certain conditions, transcriptome-wide studies of m5C modifications are still hindered by the dynamic nature of m5C and the lack of computational prediction methods. In this study, we introduced PEA-m5C, a machine learning-based m5C predictor trained with features extracted from the flanking sequence of m5C modifications. PEA-m5C yielded an average AUC (area under the receiver operating characteristic) of 0.939 in 10-fold cross-validation experiments based on known Arabidopsis m5C modifications. A rigorous independent testing showed that PEA-m5C (Accuracy [Acc] = 0.835, Matthews correlation coefficient [MCC] = 0.688) is remarkably superior to the recently developed m5C predictor iRNAm5C-PseDNC (Acc = 0.665, MCC = 0.332). PEA-m5C has been applied to predict candidate m5C modifications in annotated Arabidopsis transcripts. Further analysis of these m5C candidates showed that 4nt downstream of the translational start site is the most frequently methylated position. PEA-m5C is freely available to academic users at: https://github.com/cma2015/PEA-m5C.

表观转录组（epitranscriptome）的出现为基因调控研究开启了全新篇章。5-甲基胞嘧啶（5-methylcytosine，m5C）作为一类重要的转录后修饰（post-transcriptional modification），已被证实参与亚细胞定位（subcellular localization）、翻译保真性（translational fidelity）等多种生物学过程。尽管目前已开发出高通量实验技术（high-throughput experimental technologies），并在特定条件下用于绘制m5C修饰图谱，但由于m5C的动态特性以及缺乏有效的计算预测方法（computational prediction methods），全转录组范围（transcriptome-wide）的m5C修饰研究仍面临诸多阻碍。本研究提出PEA-m5C，一款基于机器学习的（machine learning-based）m5C预测工具，其训练所用特征提取自m5C修饰的侧翼序列（flanking sequence）。基于已知拟南芥（Arabidopsis）m5C修饰数据开展的十折交叉验证（10-fold cross-validation）实验中，PEA-m5C的平均受试者工作特征曲线下面积（area under the receiver operating characteristic，AUC）达到0.939。严格的独立测试（independent testing）结果显示，PEA-m5C（准确率[Acc]=0.835，马修斯相关系数[MCC]=0.688）显著优于近期开发的m5C预测工具iRNAm5C-PseDNC（Acc=0.665，MCC=0.332）。PEA-m5C已被应用于预测注释拟南芥转录本中的候选m5C修饰位点。对这些候选m5C位点的进一步分析表明，翻译起始位点下游4个核苷酸的位置是甲基化发生频率最高的位点。学术用户可通过以下链接免费使用PEA-m5C：https://github.com/cma2015/PEA-m5C。

创建时间：

2018-12-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集