T2T Primate Genomes
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/aglabx/dnaBPE
下载链接
链接失效反馈官方服务:
资源简介:
该数据集通过使用字节对编码(BPE)技术对九种灵长类动物的基因组进行分析,揭示了在基因组序列中的保守性和差异性,特别是突显了物种特有高复制性重复元素的影响。分析显示,在所有基因组中仅有11,569个共同的标记,这突显了由于独特重复元素的存在,开发一种通用的基因序列标记器所面临的挑战。该研究的规模涉及九种灵长类动物的基因组,任务是对基因组进行比较基因组学和标记化分析。
This dataset analyzes the genomes of nine primate species using Byte Pair Encoding (BPE) technology, revealing the conservation and divergence within genomic sequences and particularly highlighting the impact of species-specific highly replicative repetitive elements. The analysis shows that only 11,569 common markers are shared across all genomes, which underscores the challenges of developing a universal genomic sequence marker tool due to the presence of unique repetitive elements. This study covers the genomes of nine primate species, with the core task of conducting comparative genomics and tokenization analysis on these genomes.
提供机构:
Custom tool, dnaBPE



