five

JackertheHacker/svg-stack-qwen-captioned-subset

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/JackertheHacker/svg-stack-qwen-captioned-subset
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-*.parquet - split: validation path: data/validation-*.parquet - split: test path: data/test-*.parquet --- # SVG Stack Qwen Captioned Subset This dataset is a relabeled subset derived from StarVector's `starvector/svg-stack` dataset. It pairs SVG source code with generated natural language captions describing the rendered visual appearance of each SVG. The dataset is intended for supervised fine-tuning of text-to-SVG generation models. A typical training format is to use `Caption` as the input prompt and `Svg` as the target completion. Each row contains: - `Filename`: unique SVG filename from the source dataset - `Svg`: SVG source code - `Caption`: generated visual caption Rows with missing, skipped, or empty captions have been excluded. ## Splits | Split | Rows | Files | |---|---:|---| | train | 340,394 | `data/train-00000-of-00002.parquet`, `data/train-00001-of-00002.parquet` | | validation | 51,063 | `data/validation-00000-of-00001.parquet` | | test | 5,664 | `data/test-00000-of-00001.parquet` | ## Captioning Captions were generated from 384px JPEG renders of the SVGs using `unsloth/Qwen3.6-35B-A3B`. Generation settings: - max tokens: 125 - temperature: 0.7 - top p: 0.8 - top k: 20 - presence penalty: 1.5 - reasoning: off ## Intended Use Use this dataset to train or evaluate models that generate SVG code from a short visual description.
提供机构:
JackertheHacker
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作