duxiaodan/ControlSketch-Part
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/duxiaodan/ControlSketch-Part
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-to-image
- image-to-text
language:
- en
size_categories:
- 10K<n<100K
tags:
- sketch
- sketch-generation
- text-to-sketch
- text-to-sketch-generation
- vector-graphics
- svg
- bezier
- part-segmentation
pretty_name: ControlSketch-Part
configs:
- config_name: default
data_files:
- split: train
path: data/train-*.parquet
- split: validation
path: data/validation-*.parquet
- split: test
path: data/test-*.parquet
---
# ControlSketch-Part
ControlSketch-Part supports training and evaluating agents that generate vector sketches **incrementally, one semantic part at a time** rather than all at once. Each sketch is encoded as a sequence of cubic Bézier strokes on a 512×512 canvas and is paired with a short text caption, a list of semantic parts, and a per-stroke assignment that maps each stroke to exactly one part.
The underlying SVG sketch data is taken directly from the ControlSketch dataset released with the SwiftSketch paper (Arar et al., SIGGRAPH 2025). This release contributes the text captions, part lists, and stroke→part assignments on top of those sketches; the paper introducing these part annotations is *Teaching an Agent to Sketch One Part at a Time* (Du et al., 2026).
## Splits & category design
The train / validation / test partitioning and the per-split category lists are identical to the original ControlSketch dataset released with SwiftSketch (Arar et al., SIGGRAPH 2025).
| Split | Categories | Sketches |
|--------------|------------|----------|
| `train` | 15 | 14,999 |
| `validation` | 15 (same as train) | 3,000 |
| `test` | 85 disjoint from train/val | 16,990 |
| **total** | | **34,989** |
**Train / validation categories (15):** angel, astronaut, bear, bicycle, car, cat, chair, crab, dog, fish, horse, rabbit, robot, sculpture, woman.
**Test categories (85):** The Eiffel Tower, ant, apple, backpack, banana, bed, bee, beer, boat, book, broccoli, bus, butterfly, cabin, cake, camel, camera, candle, carrot, castle, child, clock, cow, cup, deer, dolphin, dragon, drill, duck, elephant, flamingo, floor lamp, flower, fork, giraffe, hammer, hat, helicopter, ice cream, jacket, kangaroo, kimono, laptop, lion, lobster, man, margarita, mermaid, moon, motorcycle, mountain, octopus, parrot, pen, phone, pig, pizza, purse, quiche, sandwich, scissors, shark, sheep, spider, squirrel, star, strawberry, submarine, sword, t-shirt, table, teapot, television, tiger, tomato, train, truck, vase, waffle, watch, whale, windmill, wine bottle, yoga, zebra.
## Data fields
| Field | Type | Description |
|-------------------|-----------------------------------------|-------------|
| `category` | `string` | Object category, matching the source folder name (e.g. `"horse"`). |
| `sketch_id` | `string` | Stable per-sketch identifier, e.g. `"horse_1000"`. Unique within a split. |
| `path_data` | `int32` array of shape `(32, 8)` | 32 cubic Bézier strokes on a 512×512 canvas. Each row is `[x0, y0, x1, y1, x2, y2, x3, y3]` — the SVG start point plus three Bézier control points, matching `M x0 y0 C x1 y1 x2 y2 x3 y3`. |
| `path_assignment` | `int32` sequence of length `32` | For each stroke `i`, an index into `parts` (`0 ≤ path_assignment[i] < len(parts)`) giving the semantic part that stroke belongs to. |
| `svg` | `string` | Full SVG rendering of the sketch (≈5–6 KB). Identical content to what `path_data` encodes, but directly renderable. |
| `short_caption` | `string` | One-sentence natural-language description of the sketch. |
| `parts` | variable-length sequence of `string` | Natural-language description of each semantic part (typically 2–5 parts per sketch). |
All sketches are normalized to **exactly 32 strokes**, inherited from the underlying ControlSketch representation.
## Example row
```python
{
"category": "horse",
"sketch_id": "horse_1000",
"path_data": [[323, 434, 348, 456, 400, 310, 317, 435],
[295, 193, 356, 171, 365, 170, 409, 212],
...], # 32 rows total
"path_assignment": [2, 1, 0, 0, 2, 2, 0, 0, ...], # 32 indices
"svg": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<svg xmlns=...>...</svg>",
"short_caption": "A horse facing left features a raised front leg, extended hind leg, arched neck, pointed ears, and trailing tail.",
"parts": [
"head and neck facing left, featuring pointed ears, an eye, and a mane",
"torso with a curved back and belly",
"two legs, consisting of a raised front leg and an extended hind leg",
"tail extending from the rear",
],
}
```
## Usage
```python
from datasets import load_dataset
ds = load_dataset("duxiaodan/ControlSketch-Part")
print(ds)
# DatasetDict({
# train: Dataset({features: [...], num_rows: 14999}),
# validation: Dataset({features: [...], num_rows: 3000}),
# test: Dataset({features: [...], num_rows: 16990}),
# })
row = ds["train"][0]
# row["path_data"] is a 32×8 int array of cubic Bézier control points.
# row["parts"] is a list of natural-language part descriptions.
# row["path_assignment"][i] ∈ [0, len(row["parts"])) — tells you which part stroke i belongs to.
```
## Known limitations
- `parts` and `short_caption` are LLM-generated (Gemini batch pipelines) and reviewed/revised, but may contain occasional noise — the part-segmentation is a soft semantic signal, not a human gold standard.
- All sketches are normalized to exactly 32 strokes, inherited from the ControlSketch representation; some complex objects may be truncated and some simple objects padded by the source pipeline.
## License
Released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). You are free to share and adapt the data, provided you give appropriate credit — see the Citation section below. The underlying sketches follow the license of the original ControlSketch / SwiftSketch release.
## Citation
If you use this dataset, please cite **both** the original SwiftSketch / ControlSketch paper (source of the sketches) **and** this part-annotated release (source of the captions, part lists, and stroke→part assignments):
```bibtex
@article{du2026sketch,
title = {Teaching an Agent to Sketch One Part at a Time},
author = {Du, Xiaodan and Xu, Ruize and Yunis, David and Vinker, Yael and Shakhnarovich, Greg},
journal = {arXiv preprint arXiv:2603.19500},
year = {2026}
}
@inproceedings{10.1145/3721238.3730612,
author = {Arar, Ellie and Frenkel, Yarden and Cohen-Or, Daniel and Shamir, Ariel and Vinker, Yael},
title = {SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation},
year = {2025},
isbn = {9798400715402},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3721238.3730612},
doi = {10.1145/3721238.3730612},
booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
articleno = {82},
numpages = {12},
keywords = {Sketch Synthesis, Image-to-Vector Generation, Image-based Rendering, Vector Graphics, Diffusion Models, Stroke-based Representation},
series = {SIGGRAPH Conference Papers '25}
}
```
## Contact
Xiaodan Du — xdu@ttic.edu
提供机构:
duxiaodan



