JackertheHacker/svg-stack-qwen-captioned-subset

Name: JackertheHacker/svg-stack-qwen-captioned-subset
Creator: JackertheHacker
Published: 2026-04-18 04:15:12
License: 暂无描述

Hugging Face2026-04-18 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/JackertheHacker/svg-stack-qwen-captioned-subset

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-*.parquet - split: validation path: data/validation-*.parquet - split: test path: data/test-*.parquet --- # SVG Stack Qwen Captioned Subset This dataset is a relabeled subset derived from StarVector's `starvector/svg-stack` dataset. It pairs SVG source code with generated natural language captions describing the rendered visual appearance of each SVG. The dataset is intended for supervised fine-tuning of text-to-SVG generation models. A typical training format is to use `Caption` as the input prompt and `Svg` as the target completion. Each row contains: - `Filename`: unique SVG filename from the source dataset - `Svg`: SVG source code - `Caption`: generated visual caption Rows with missing, skipped, or empty captions have been excluded. ## Splits | Split | Rows | Files | |---|---:|---| | train | 340,394 | `data/train-00000-of-00002.parquet`, `data/train-00001-of-00002.parquet` | | validation | 51,063 | `data/validation-00000-of-00001.parquet` | | test | 5,664 | `data/test-00000-of-00001.parquet` | ## Captioning Captions were generated from 384px JPEG renders of the SVGs using `unsloth/Qwen3.6-35B-A3B`. Generation settings: - max tokens: 125 - temperature: 0.7 - top p: 0.8 - top k: 20 - presence penalty: 1.5 - reasoning: off ## Intended Use Use this dataset to train or evaluate models that generate SVG code from a short visual description.

提供机构：

JackertheHacker

5,000+

优质数据集

54 个

任务类型

进入经典数据集