henry1477/pcbslm-static-v2-unsloth-vlm
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/henry1477/pcbslm-static-v2-unsloth-vlm
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- image-text-to-text
- visual-question-answering
language:
- en
pretty_name: PCBSLM static-v2 Unsloth VLM
tags:
- unsloth
- gemma-4
- multimodal
- pcb
- electronics
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: train
path: data/vlm_train.jsonl
- split: validation
path: data/vlm_val.jsonl
- split: test
path: data/vlm_test.jsonl
---
# PCBSLM static-v2 Unsloth VLM
Portable multimodal Unsloth dataset for PCB layout/document-grounded training.
The JSONL splits use Unsloth/Gemma-style chat messages:
```json
{
"messages": [
{"role": "user", "content": [
{"type": "image", "image": "assets/raw_docs/.../images/page.png"},
{"type": "text", "text": "instruction..."}
]},
{"role": "assistant", "content": [
{"type": "text", "text": "{...json answer...}"}
]}
]
}
```
## Files
- `data/vlm_train.jsonl`: 4307 multimodal examples
- `data/vlm_val.jsonl`: 289 multimodal examples
- `data/vlm_test.jsonl`: 494 multimodal examples
- `assets/raw_docs/`: source documents, rendered pages, and figure crops referenced by examples/metadata
- `assets/board_images/`: board render images referenced by examples
- `metadata_bundle.tar.gz`: document/evidence/rule metadata with repo-relative asset paths
## Unsloth Smoke Test
This was verified locally with `unsloth/gemma-4-E2B-it` using:
```bash
python scripts/smoke_train_unsloth_vlm.py \
data/vlm_train.jsonl \
--model-name unsloth/gemma-4-E2B-it \
--limit 4 \
--max-steps 2 \
--max-images 1 \
--max-seq-length 512 \
--resize 256
```
If training outside a Hugging Face snapshot checkout, resolve image paths relative to the dataset root.
HF repo: `henry1477/pcbslm-static-v2-unsloth-vlm`
提供机构:
henry1477



