ThorKl/VHHCorpus-2M-CDR-Deduplicated
收藏Hugging Face2025-07-25 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ThorKl/VHHCorpus-2M-CDR-Deduplicated
下载链接
链接失效反馈官方服务:
资源简介:
VHHCorpus-2M-CDR-Deduplicated数据集是一个包含约199万条去重后的VHH(抗体重链可变区)序列的数据集。它包含了序列本身以及CDR1、CDR2、CDR3等关键区域的信息,并且提供了 organism 字段来指示序列来源的生物种类。数据集经过去重处理,保证了数据的多样性。
The VHHCorpus-2M-CDR-Deduplicated dataset consists of approximately 1.99 million deduplicated VHH (Variable Heavy Chain) sequences. It includes the sequence itself, as well as information on key regions such as CDR1, CDR2, CDR3, and provides the organism field indicating the species of origin of the sequence. The dataset has been processed for deduplication to ensure diversity in the data.
提供机构:
ThorKl



