metaXu264/generator-cas-ft-dataset
收藏Hugging Face2026-02-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/metaXu264/generator-cas-ft-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- sequence-modeling
language:
- dna
size_categories:
- 1M+
---
# GENERator Cas Fine-tuning Dataset
This dataset is designed for fine-tuning **GENERator** on CRISPR-Cas–related DNA sequences.
## File
- `generator_cas_ft_dataset.jsonl`
## Format
Each line is a JSON object, typically containing:
- `input` : DNA sequence (length multiple of 6 for 6-mer tokenizer)
- `output` : DNA sequence (autoregressive target)
- `role` : cas_protein / crispr / tracrRNA / spacer (if present)
- `source` : CRISPR-Cas Atlas–derived
## Usage
```python
from datasets import load_dataset
ds = load_dataset(
"metaXu264/generator-cas-ft-dataset",
data_files="generator_cas_ft_dataset.jsonl"
)
提供机构:
metaXu264



