"nRC13"
收藏DataCite Commons2026-05-17 更新2026-05-19 收录
下载链接:
https://ieee-dataport.org/documents/nrc13
下载链接
链接失效反馈官方服务:
资源简介:
" The non-coding RNA (ncRNA) dataset (nRC13) is a benchmark set of such sequences, which are developed for multiclass classification of RNA families. The data was introduced by Antonino Fiannaca and collaborators in their paper nRC: non-coding RNA Classifier based on structural features and contains 8,840 RNA sequences, categorized into 13 classes of functional ncRNA: microRNA (miRNA), 5S ribosomal RNA, 5.8S ribosomal RNA, transfer RNA (tRNA), ribozyme, CD-box small nucleolar RNA (snoRNA), H\/ACA-box snoRNA, small Cajal body-specific RNA (scaRNA), leader RNA, riboswitch, internal ribosome entry site (IRES), Group I intron, and Group II intron. Sequences were obtained from the \"curated\" families in the Rfam database, and were provided with high quality annotation and biological relevance. The dataset covers a wide spectrum of sequence lengths, sequence structural complexity and evolutionary conservation patterns that makes it a difficult benchmark for ncRNA classification. As the classes included have overlapping sequence motifs and secondary structures, nRC13 is especially suitable for testing a model's capacity to learn discriminative representations from heterogeneous RNA data. Since its launch, nRC13 has been widely adopted in the evaluation of feature engineering strategies, classic machine learning classifiers and novel deep-learning architectures like convolutional neural networks, recurrent networks, graph neural networks and transformer-based models like RNA language models. Consequently, it is one of the most popular and reliable standards for the non-coding RNA classification research."
提供机构:
IEEE DataPort
创建时间:
2026-05-17



