ChengsenWang/TinyDNABERT-PretrainData-V1
收藏Hugging Face2025-07-21 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ChengsenWang/TinyDNABERT-PretrainData-V1
下载链接
链接失效反馈官方服务:
资源简介:
TinyDNABERT-20M-V1的预训练数据集,包含了通过移除人类参考基因组GRCh38.p14中的所有“N”标记并随机分割成256到1024碱基对的片段得到的4,904,969个训练样本。该数据集与生物学和基因组学相关,大小在1M到10M之间。
The pretraining dataset for TinyDNABERT-20M-V1, consisting of 4,904,969 training samples obtained by removing all N markers from the human reference genome GRCh38.p14 and randomly splitting the remaining sequences into segments of 256 to 1024 base pairs. The dataset is related to biology and genomics, with a size ranging from 1M to 10M.
提供机构:
ChengsenWang



