five

OASfiltered/OAS95-aligned-cleaned

收藏
Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/OASfiltered/OAS95-aligned-cleaned
下载链接
链接失效反馈
官方服务:
资源简介:
--- viewer: true --- # OAS95-aligned-cleaned A cleaned and IMGT-aligned antibody sequence dataset derived from the Observed Antibody Space (OAS) database. ## Preparation - Started from the OAS dataset preprocessed with the IgLM pipeline and clustered at 95% sequence identity. - Numbered all sequences with ANARCII using the IMGT numbering scheme. - Removed heavy- and light-chain sequences with numbering gaps at the first framework position. - Removed light-chain sequences shorter than 90 residues. - Retained 203,968,932 training sequences and 11,819,793 test sequences across 6 organism labels and 2 chain types. ## Columns | Column | Description | | --- | --- | | `sequence` | IMGT-aligned amino acid sequence with gap characters. | | `init_seq` | Original amino acid sequence without alignment gaps. | | `class` | Organism label. | | `type` | Antibody chain type: `Heavy` or `Light`. | ## Split Sizes - train: 203,968,932 - test: 11,819,793 - total: 215,788,725
提供机构:
OASfiltered
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作