OASfiltered/OAS95-aligned-cleaned
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/OASfiltered/OAS95-aligned-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
---
viewer: true
---
# OAS95-aligned-cleaned
A cleaned and IMGT-aligned antibody sequence dataset derived from the Observed Antibody Space (OAS) database.
## Preparation
- Started from the OAS dataset preprocessed with the IgLM pipeline and clustered at 95% sequence identity.
- Numbered all sequences with ANARCII using the IMGT numbering scheme.
- Removed heavy- and light-chain sequences with numbering gaps at the first framework position.
- Removed light-chain sequences shorter than 90 residues.
- Retained 203,968,932 training sequences and 11,819,793 test sequences across 6 organism labels and 2 chain types.
## Columns
| Column | Description |
| --- | --- |
| `sequence` | IMGT-aligned amino acid sequence with gap characters. |
| `init_seq` | Original amino acid sequence without alignment gaps. |
| `class` | Organism label. |
| `type` | Antibody chain type: `Heavy` or `Light`. |
## Split Sizes
- train: 203,968,932
- test: 11,819,793
- total: 215,788,725
提供机构:
OASfiltered



