five

Improved lung cancer classification by employing diverse molecular features of microRNAs

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP463829
下载链接
链接失效反馈
官方服务:
资源简介:
Lung adenocarcinoma (LUAD) is one of the most common pathological and histological subtypes of primary lung cancer, with high morbidity and mortality. MicroRNAs (miRNAs) are endogenous small non-coding RNAs that regulate the expression of genes at post-transcriptional level. It was reported that A-to-I miRNA editing was decreased in tumors, suggesting the potential value of miRNA editing in cancer classification. However, existing miRNA-based cancer classification models mainly used the frequencies of miRNAs. In order to validate the contribution of miRNA editing information in cancer classification, we extracted three types of miRNA features, including the abundances of original miRNAs, the abundances of edited miRNAs, and the editing levels of miRNA editing sites. Our results show that four classification algorithms selected, i.e., kNN, C4.5, RF and SVM, generally had better performances on all features than on the abundances of miRNAs alone. Since the number of features were large, we used three feature selection (FS) methods to further improve the classification models. One of the FS methods, the DFL algorithm, selected only three features, i.e., the frequencies of hsa-miR-135b-5p, hsa-miR-210-3p and hsa-miR-182 48u (an edited miRNA), from 316 training samples. And all of the four classification algorithms achieved 100% accuracy on these three features for 79 independent testing samples. These results indicate that the additional information of miRNA editing are useful in improving the classification of LUAD samples. And the three miRNAs selected by DFL potentially represent an effective molecular signature for LUAD diagnosis. Overall design: Small RNA-Seq for 19 lung adenocarcinoma (LUAD) and 19 adjacent normal tissues were obtained, and put into liquid nitrogen immediately after resection. The total RNAs were retrieved and the small RNA sequencing libraries were prepared and sequenced by BGI (Shenzhen, China). Next, the mutation and editing sites of miRNAs were analyzed with the MiRME algorithm for all of these 38 sRNA-seq profiles and 357 public LUAD and normal samples. Then, abundance of original and edited miRNAs, editing levels of identified miRNA editing sites were obtained for these 395 samples. Four machine learning algorithms were used to classify these samples as LUAD or normal samples. Three Feature Selection algorithms were used to select molecular features that were accurate in predicting the samples.

肺腺癌(Lung adenocarcinoma, LUAD)是原发性肺癌最常见的病理组织学亚型之一,具有较高的发病率与死亡率。微小RNA(MicroRNAs, miRNAs)是一类内源性小型非编码RNA,可在转录后水平调控基因表达。已有研究表明,肿瘤中A-to-I型miRNA编辑水平降低,提示miRNA编辑在癌症分类中具有潜在应用价值。然而,现有基于miRNA的癌症分类模型大多仅采用miRNA的表达丰度。为验证miRNA编辑信息在癌症分类中的贡献,我们提取了三类miRNA特征:原始miRNA的表达丰度、编辑型miRNA的表达丰度,以及miRNA编辑位点的编辑水平。研究结果显示,所选用的四种分类算法——k近邻算法(kNN)、C4.5决策树、随机森林(RF)以及支持向量机(SVM)——在全部三类特征上的整体表现均优于仅使用原始miRNA表达丰度的模型。由于特征维度较高,我们采用三种特征选择(Feature Selection, FS)方法以进一步优化分类模型。其中一种特征选择算法DFL仅从316份训练样本中筛选出3个特征:hsa-miR-135b-5p、hsa-miR-210-3p以及编辑型miRNA hsa-miR-182-48u。针对79份独立测试样本,四种分类算法基于这3个特征均实现了100%的分类准确率。上述结果表明,miRNA编辑的额外信息可有效提升肺腺癌样本的分类效果,而DFL筛选出的这3种miRNA有望成为肺腺癌诊断的有效分子标志物。 总体实验设计:我们获取了19份肺腺癌(LUAD)组织与19份配对正常组织的小RNA测序数据,样本经手术切除后立即置入液氮中保存。提取总RNA后,由中国深圳华大基因(BGI)完成小RNA测序文库的构建与测序。针对全部38份小RNA测序样本以及357份公开的LUAD与正常样本,我们使用MiRME算法分析miRNA的突变与编辑位点。最终得到395份样本的原始miRNA与编辑型miRNA表达丰度、已鉴定的miRNA编辑位点编辑水平数据。采用四种机器学习算法将样本划分为肺腺癌组织或正常组织,并通过三种特征选择算法筛选出可准确预测样本类别的分子特征。
创建时间:
2024-02-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作