bm2-lab/GuideRNA-3B
收藏Hugging Face2024-08-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/bm2-lab/GuideRNA-3B
下载链接
链接失效反馈官方服务:
资源简介:
GuideRNA-3B是一个包含超过37亿对序列的大型转录组序列语料库,这些序列是从23种细胞系和超过200个RNA病毒的分段基因组中提取的。基于这个核苷酸序列语料库,可以建立基础模型以表征CRISPR引导RNA靶向区域的多样性,从而进行进一步的CRISPR-based RNA病毒无扩增检测和抑制的下游任务。数据集的结构是以Arrow格式压缩的JSON-lined文本文件,包含before和next两个字段,分别表示细胞系或RNA病毒的转录组核苷酸序列和向右移动一位的核苷酸序列。
GuideRNA-3B is a large transcriptome sequence corpus consisting of over 3.7 billion paired sequences extracted from the specific transcriptome of 23 cell lines and over 200 segmented genomes of RNA virus. Based on these nucleotide sequences, a foundation model can be established to characterize the manifold of CRISPR guide RNA targeting regions for downstream tasks such as universal CRISPR-based RNA virus amplification-free detection and inhibition. The dataset is provided in jsonl-based Arrow format, containing two main columns: before and next, representing nucleotide sequences from the transcriptome or RNA virus and their right-shifted counterparts, respectively.
提供机构:
bm2-lab



