Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction
收藏DataONE2026-01-29 更新2026-02-07 收录
下载链接:
https://search.dataone.org/view/sha256:f5fff240d205e4fc834aac1f65f488bad87fd40bd315bbaf1fb388bf64a8290b
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains RNA secondary structure data used for training and testing eFold, a deep learning model for RNA secondary structure prediction. The dataset comprises three main components: (1) experimentally determined secondary structure models for 1,098 pri-miRNAs and 1,456 human mRNA regions derived from DMS-MaP-seq chemical probing experiments, representing the original contribution of this work; (2) a curated pre-training dataset combining subsets of bpRNA (base-pair RNA database) and RNAstralign databases, filtered to remove redundant sequences and ArchiveII sequences as described in the associated publication; and (3) benchmark test sets for evaluating model performance on long and diverse RNA structures.
The dataset includes sequence files in FASTA format and corresponding secondary structure annotations in dot-bracket notation. Structure models represent experimentally validated folding patterns with reactivity data from chemical probing assays. The pri-miRNA structures r..., , # Diverse database and machine learning model to narrow the generalization gap in RNA structure prediction
Dataset DOI: [10.5061/dryad.79cnp5j95](https://doi.org/10.5061/dryad.79cnp5j95)
## Description of the data and file structure
The data is a json file, structured as follows:
```
sequence_name:
sequence: AAGUGAAG.. # string of nucleotides
structure: [[171, 317], [351, 403], ...] # list of base pairs
shape: [0.4519, 1.0903, 0.5035, 0.1382,...] # list of normalized shape reactivities (when available)
dms: [1.0, 0.7283, -1000.0, -1000.0, ...] # list of normalized DMS reactivities (when available). Since DMS only reacts to A and C, all reactivities for G and U are set to -1000.
```
### Files and variables
#### File: pri_miRNA.json
**Description:** Original contribution. See section Methods of [https://www.biorxiv.org/content/10.1101/2024.01.24.577093v4](https://www.biorxiv.org/content/10.1101/2024.01.24.577093v4).
#### File: human_mRNA.json
**Description:** Orig...,
创建时间:
2026-01-29



