mRNA sequences from five species

Name: mRNA sequences from five species
Creator: NIH Datasets API
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://www.ncbi.nlm.nih.gov/datasets/docs/v2/reference-docs/command-line/datasets/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了从人类、大鼠、小鼠、鸡和斑马鱼中收集的mRNA序列，旨在用于预训练mRNA2vec模型。这些序列经过处理，提取了5'非翻译区（5'UTR）和编码序列（CDS）区域，并应用了特定的截断方法以避免填充。该数据集规模为51万条序列，平均长度为459个碱基对。其任务是进行mRNA嵌入模型的预训练。

This dataset contains mRNA sequences collected from humans, rats, mice, chickens, and zebrafish, and is designed for pre-training the mRNA2vec model. All sequences were processed to extract the 5' untranslated region (5'UTR) and coding sequence (CDS) regions, with a specific truncation method applied to avoid sequence padding. This dataset consists of 510,000 sequences with an average length of 459 base pairs. The core task of this dataset is pre-training mRNA embedding models.

提供机构：

NIH Datasets API

5,000+

优质数据集

54 个

任务类型

进入经典数据集