A long context RNA foundation model for predicting transcriptome architecture

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP540164

下载链接

链接失效反馈

官方服务：

资源简介：

Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-molecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNASH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architectureâthe relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNASH handles extremely long sequence inputs at base-pair resolution (~65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and the accompanying frontier model will accelerate many aspects of RNA biotechnology. More broadly, we envision the use of LoRNASH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing. Overall design: Cells were cultured until 80% confluence before RNA extraction using TRIzol reagent (ThermoFisher) followed by column purification (Zymo Research). Short-read (Takara Bio SMARTer Stranded Total RNA-seq kit v3) and long-read RNA-sequencing libraries (NEBNext Single Cell/Low Input cDNA Synthesis & Amplification Module, PacBio Iso-Seq Express Oligo Kit and SMRTbell express template prep kit 3.0) were then constructed using matched RNA samples.

创建时间：

2024-10-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集