nllg/DaTikZ-V4

Name: nllg/DaTikZ-V4
Creator: nllg
Published: 2026-03-17 14:04:01
License: 暂无描述

Hugging Face2026-03-17 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/nllg/DaTikZ-V4

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 pipeline_tag: text-generation task_categories: - text-generation size_categories: - 100K<n<1M tags: - tikz - latex - code-generation - scientific-figures --- # Dataset Card for DaTikZ-V4 DaTikZ-V4 is the dataset used to train **TikZilla-3B**, **TikZilla-3B-RL**, **TikZilla-8B**, and **TikZilla-8B-RL** for generating TikZ/LaTeX figures from natural language descriptions. The TikZ code has been sourced from **ArXiv**, **GitHub**, and **TeXStackExchange**. Scientific figure descriptions were generated using **Qwen2.5-VL-7B-Instruct**. ## Dataset fields Each sample contains: - `file_id`: unique identifier - `caption`: original caption - `vlm_description`: detailed visual description generated by a VLM - `tikz_code`: full LaTeX/TikZ source code - `source`: data source (e.g., arxiv, github, tex, synthetic) - `png_image`: rendered image of the TikZ figure ## Installation ```bash pip install datasets ``` ## Usage ```python from datasets import load_dataset dataset_id = "nllg/DaTikZ-V4" ds = load_dataset(dataset_id, split="train") sample = ds[0] print(sample["file_id"]) print(sample["caption"]) print(sample["vlm_description"]) print(sample["tikz_code"]) image = sample["png_image"] ```

提供机构：

nllg

5,000+

优质数据集

54 个

任务类型

进入经典数据集