five

nllg/DaTikZ-V4

收藏
Hugging Face2026-03-17 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nllg/DaTikZ-V4
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pipeline_tag: text-generation task_categories: - text-generation size_categories: - 100K<n<1M tags: - tikz - latex - code-generation - scientific-figures --- # Dataset Card for DaTikZ-V4 DaTikZ-V4 is the dataset used to train **TikZilla-3B**, **TikZilla-3B-RL**, **TikZilla-8B**, and **TikZilla-8B-RL** for generating TikZ/LaTeX figures from natural language descriptions. The TikZ code has been sourced from **ArXiv**, **GitHub**, and **TeXStackExchange**. Scientific figure descriptions were generated using **Qwen2.5-VL-7B-Instruct**. ## Dataset fields Each sample contains: - `file_id`: unique identifier - `caption`: original caption - `vlm_description`: detailed visual description generated by a VLM - `tikz_code`: full LaTeX/TikZ source code - `source`: data source (e.g., arxiv, github, tex, synthetic) - `png_image`: rendered image of the TikZ figure ## Installation ```bash pip install datasets ``` ## Usage ```python from datasets import load_dataset dataset_id = "nllg/DaTikZ-V4" ds = load_dataset(dataset_id, split="train") sample = ds[0] print(sample["file_id"]) print(sample["caption"]) print(sample["vlm_description"]) print(sample["tikz_code"]) image = sample["png_image"] ```
提供机构:
nllg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作