nllg/DaTikZ-V4
收藏Hugging Face2026-03-17 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nllg/DaTikZ-V4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
pipeline_tag: text-generation
task_categories:
- text-generation
size_categories:
- 100K<n<1M
tags:
- tikz
- latex
- code-generation
- scientific-figures
---
# Dataset Card for DaTikZ-V4
DaTikZ-V4 is the dataset used to train **TikZilla-3B**, **TikZilla-3B-RL**, **TikZilla-8B**, and **TikZilla-8B-RL** for generating TikZ/LaTeX figures from natural language descriptions.
The TikZ code has been sourced from **ArXiv**, **GitHub**, and **TeXStackExchange**. Scientific figure descriptions were generated using **Qwen2.5-VL-7B-Instruct**.
## Dataset fields
Each sample contains:
- `file_id`: unique identifier
- `caption`: original caption
- `vlm_description`: detailed visual description generated by a VLM
- `tikz_code`: full LaTeX/TikZ source code
- `source`: data source (e.g., arxiv, github, tex, synthetic)
- `png_image`: rendered image of the TikZ figure
## Installation
```bash
pip install datasets
```
## Usage
```python
from datasets import load_dataset
dataset_id = "nllg/DaTikZ-V4"
ds = load_dataset(dataset_id, split="train")
sample = ds[0]
print(sample["file_id"])
print(sample["caption"])
print(sample["vlm_description"])
print(sample["tikz_code"])
image = sample["png_image"]
```
提供机构:
nllg



