five

Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction

收藏
DataONE2026-01-29 更新2026-02-07 收录
下载链接:
https://search.dataone.org/view/sha256:f5fff240d205e4fc834aac1f65f488bad87fd40bd315bbaf1fb388bf64a8290b
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains RNA secondary structure data used for training and testing eFold, a deep learning model for RNA secondary structure prediction. The dataset comprises three main components: (1) experimentally determined secondary structure models for 1,098 pri-miRNAs and 1,456 human mRNA regions derived from DMS-MaP-seq chemical probing experiments, representing the original contribution of this work; (2) a curated pre-training dataset combining subsets of bpRNA (base-pair RNA database) and RNAstralign databases, filtered to remove redundant sequences and ArchiveII sequences as described in the associated publication; and (3) benchmark test sets for evaluating model performance on long and diverse RNA structures. The dataset includes sequence files in FASTA format and corresponding secondary structure annotations in dot-bracket notation. Structure models represent experimentally validated folding patterns with reactivity data from chemical probing assays. The pri-miRNA structures r..., , # Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction Dataset DOI: [10.5061/dryad.79cnp5j95](https://doi.org/10.5061/dryad.79cnp5j95) ## Description of the data and file structure The data is a json file, structured as follows: ``` sequence_name: sequence: AAGUGAAG.. # string of nucleotides structure: [[171, 317], [351, 403], ...] # list of base pairs shape: [0.4519, 1.0903, 0.5035, 0.1382,...] # list of normalized shape reactivities (when available) dms: [1.0, 0.7283, -1000.0, -1000.0, ...] # list of normalized DMS reactivities (when available). Since DMS only reacts to A and C, all reactivities for G and U are set to -1000. ``` ### Files and variables #### File: pri_miRNA.json **Description:** Original contribution. See section Methods of [https://www.biorxiv.org/content/10.1101/2024.01.24.577093v4](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v4). #### File: human_mRNA.json **Description:** Orig...,
创建时间:
2026-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作