JackertheHacker/svg-stack-qwen-captioned-subset
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/JackertheHacker/svg-stack-qwen-captioned-subset
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*.parquet
- split: validation
path: data/validation-*.parquet
- split: test
path: data/test-*.parquet
---
# SVG Stack Qwen Captioned Subset
This dataset is a relabeled subset derived from StarVector's
`starvector/svg-stack` dataset. It pairs SVG source code with generated natural
language captions describing the rendered visual appearance of each SVG.
The dataset is intended for supervised fine-tuning of text-to-SVG generation
models. A typical training format is to use `Caption` as the input prompt and
`Svg` as the target completion.
Each row contains:
- `Filename`: unique SVG filename from the source dataset
- `Svg`: SVG source code
- `Caption`: generated visual caption
Rows with missing, skipped, or empty captions have been excluded.
## Splits
| Split | Rows | Files |
|---|---:|---|
| train | 340,394 | `data/train-00000-of-00002.parquet`, `data/train-00001-of-00002.parquet` |
| validation | 51,063 | `data/validation-00000-of-00001.parquet` |
| test | 5,664 | `data/test-00000-of-00001.parquet` |
## Captioning
Captions were generated from 384px JPEG renders of the SVGs using
`unsloth/Qwen3.6-35B-A3B`.
Generation settings:
- max tokens: 125
- temperature: 0.7
- top p: 0.8
- top k: 20
- presence penalty: 1.5
- reasoning: off
## Intended Use
Use this dataset to train or evaluate models that generate SVG code from a
short visual description.
提供机构:
JackertheHacker



