five

Zebra-CoT

收藏
魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/multimodal-reasoning-lab/Zebra-CoT
下载链接
链接失效反馈
官方服务:
资源简介:
# Zebra‑CoT > A diverse large-scale dataset for interleaved vision‑language reasoning traces. [![Paper on ArXiv](https://img.shields.io/badge/arxiv-2507.16746-red)](https://arxiv.org/abs/2507.16746) [![Code on GitHub](https://img.shields.io/badge/github-Code-black)](https://github.com/multimodal-reasoning-lab/Bagel-Zebra-CoT) [![Dataset on Hugging Face](https://img.shields.io/badge/huggingface-Zebra--CoT-lightblue)](https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT) [![Model on Hugging Face](https://img.shields.io/badge/huggingface-Anole--Zebra--CoT-green)](https://huggingface.co/multimodal-reasoning-lab/Anole-Zebra-CoT) [![Model on Hugging Face](https://img.shields.io/badge/huggingface-Bagel--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Bagel-Zebra-CoT) ![Image](zebra_cot_datacard.png) --- ## Dataset Description Zebra‑CoT is a diverse large‑scale dataset with 182,384 samples containing logically coherent interleaved text‑image reasoning traces across four major categories: scientific reasoning, 2D visual reasoning, 3D visual reasoning, and visual logic & strategic games. --- ## Dataset Structure Each example in Zebra‑CoT consists of: * **Problem statement**: textual description of the question. * **Problem image**: Zero or more images accompanying the problem, depending on its nature. * **Reasoning image**: At least one or more visual aids that support intermediate reasoning steps during problem solving. * **Text Reasoning Trace**: a sequence of text thoughts (`THOUGHT x`) and corresponding visual sketches or diagrams placeholders, such as `<image_start>[problem_image_1]<image_end>` and `<image_start>[reasoning_image_1]<image_end>`. * **Final answer**: the solution to the problem. --- ## Usage * To prepare the interleaved text-image traces for training, replace all `<image_start>[problem_image_x]<image_end>` and `<image_start>[reasoning_image_x]<image_end>` in the text trace with the actual images. We performed careful data cleaning to make sure each image and image placeholder has a one to one mapping. * For process supervision related training, you can search for the pattern `THOUGHT_x` and treat it as a step. We also performed rigorous check to make sure each `THOUGHT_x` only appears once in a single reasoning trace. * Additionally, to wrap the text thoughts with thinking tokens such as `<think>` and `</think>`, simply look for the adjacent image placeholders such as `<image_start>[reasoning_image_i]<image_end>` and `<image_start>[reasoning_image_{i+1}]<image_end>`, and wrap the text within the thinking tokens. You can further remove the `THOUGHT_x` patterns to create clean thinking flows. --- ## Statistics | General Category | Sample Count | Percentage | | :----------------------------- | -----------: | ---------: | | [2D Visual Reasoning](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-2d-visual-reasoning-687d70857ea0d27207bc3b33) | 51,899 | 28.5% | | [3D Visual Reasoning](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-3d-visual-reasoning-687d7271fe67cd788003e715) | 39,610 | 21.7% | | [Scientific Reasoning](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-scientific-reasoning-687d6fd3e18f55a97e14b0b5) | 24,021 | 13.2% | | [Visual Logic & Strategic Games](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-visual-logic-and-strategic-games-687d71776e9680460237f533) | 66,854 | 36.7% | | **Total** | 182,384 | 100.0% | Statistics are detailed in Table 3 of the paper. --- ## Models Finetuned with Zebra‑CoT * **Anole‑Zebra‑CoT**: A 7B parameter vision–language model based on Anole‑7B and fine‑tuned on Zebra‑CoT to generate interleaved visual Chain‑of‑Thought (CoT) reasoning. [![Model on Hugging Face](https://img.shields.io/badge/huggingface-Anole--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Anole-Zebra-CoT) * **Bagel‑Zebra‑CoT**: A 7B parameter vision–language model based on Bagel‑7B and fine‑tuned on Zebra‑CoT to generate interleaved visual Chain‑of‑Thought (CoT) reasoning. [![Model on Hugging Face](https://img.shields.io/badge/huggingface-Bagel--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Bagel-Zebra-CoT) --- ## Citation If you use Zebra‑CoT, please cite: ```bibtex @misc{li2025zebracot, title={Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning}, author={Ang Li and Charles Wang and Kaiyu Yue and Zikui Cai and Ollie Liu and Deqing Fu and Peng Guo and Wang Bill Zhu and Vatsal Sharan and Robin Jia and Willie Neiswanger and Furong Huang and Tom Goldstein and Micah Goldblum}, year={2025}, eprint={2507.16746}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.16746}, } ```

# Zebra‑CoT > 面向交错式视觉语言推理轨迹的多样化大规模数据集。 [![ArXiv论文](https://img.shields.io/badge/arxiv-2507.16746-red)](https://arxiv.org/abs/2507.16746) [![GitHub代码](https://img.shields.io/badge/github-Code-black)](https://github.com/multimodal-reasoning-lab/Bagel-Zebra-CoT) [![Hugging Face数据集](https://img.shields.io/badge/huggingface-Zebra--CoT-lightblue)](https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT) [![Hugging Face模型](https://img.shields.io/badge/huggingface-Anole--Zebra--CoT-green)](https://huggingface.co/multimodal-reasoning-lab/Anole-Zebra-CoT) [![Hugging Face模型](https://img.shields.io/badge/huggingface-Bagel--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Bagel-Zebra-CoT) ![Image](zebra_cot_datacard.png) --- ## 数据集描述 Zebra‑CoT是一款多样化大规模数据集,共包含182384条样本,涵盖四大类逻辑自洽的交错式图文推理轨迹:科学推理、二维视觉推理(2D Visual Reasoning)、三维视觉推理(3D Visual Reasoning),以及视觉逻辑与策略博弈类任务。 --- ## 数据集结构 Zebra‑CoT中的每个样本包含以下内容: * **问题陈述**:问题的文本描述。 * **问题图像**:根据问题属性,可包含0张或多张配套图像。 * **推理图像**:至少一张或多张用于辅助问题求解过程中中间推理步骤的视觉辅助材料。 * **文本推理轨迹**:由一系列文本思考步骤(`THOUGHT x`)与对应视觉草图或示意图占位符组成的序列,例如`<image_start>[problem_image_1]<image_end>`与`<image_start>[reasoning_image_1]<image_end>`。 * **最终答案**:问题的解决方案。 --- ## 使用方法 * 若需为训练准备交错式图文推理轨迹,请将文本轨迹中的所有`<image_start>[problem_image_x]<image_end>`与`<image_start>[reasoning_image_x]<image_end>`占位符替换为实际图像。本数据集已经过严格数据清洗,确保每张图像与对应占位符一一映射。 * 若需开展与过程监督相关的训练,可检索`THOUGHT_x`模式并将其视为一个推理步骤。本数据集亦经过严谨校验,确保每个`THOUGHT_x`在单条推理轨迹中仅出现一次。 * 此外,若需为文本思考步骤添加`<think>`与`</think>`这类思考标记,只需找到相邻的图像占位符(如`<image_start>[reasoning_image_i]<image_end>`与`<image_start>[reasoning_image_{i+1}]<image_end>`),并将其间的文本包裹于思考标记中。你还可进一步移除`THOUGHT_x`模式,以生成简洁的思考流程。 --- ## 统计信息 | 通用类别 | 样本数量 | 占比 | | :--------------------------- | -------: | -----: | | [二维视觉推理(2D Visual Reasoning)](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-2d-visual-reasoning-687d70857ea0d27207bc3b33) | 51,899 | 28.5% | | [三维视觉推理(3D Visual Reasoning)](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-3d-visual-reasoning-687d7271fe67cd788003e715) | 39,610 | 21.7% | | [科学推理(Scientific Reasoning)](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-scientific-reasoning-687d6fd3e18f55a97e14b0b5) | 24,021 | 13.2% | | [视觉逻辑与策略博弈(Visual Logic & Strategic Games)](https://huggingface.co/collections/multimodal-reasoning-lab/zebra-cot-v10-visual-logic-and-strategic-games-687d71776e9680460237f533) | 66,854 | 36.7% | | **总计** | 182,384 | 100.0% | 数据集详细统计信息参见论文中的表3。 --- ## 基于Zebra‑CoT微调的模型 * **Anole‑Zebra‑CoT**:基于Anole‑7B构建的70亿参数视觉语言模型,在Zebra‑CoT上微调后可生成交错式视觉思维链(Chain-of-Thought, CoT)推理内容。[![Hugging Face模型](https://img.shields.io/badge/huggingface-Anole--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Anole-Zebra-CoT) * **Bagel‑Zebra‑CoT**:基于Bagel‑7B构建的70亿参数视觉语言模型,在Zebra‑CoT上微调后可生成交错式视觉思维链(Chain-of-Thought, CoT)推理内容。[![Hugging Face模型](https://img.shields.io/badge/huggingface-Bagel--Zebra--CoT-orange)](https://huggingface.co/multimodal-reasoning-lab/Bagel-Zebra-CoT) --- ## 引用格式 若您使用Zebra‑CoT数据集,请引用以下文献: bibtex @misc{li2025zebracot, title={Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning}, author={Ang Li and Charles Wang and Kaiyu Yue and Zikui Cai and Ollie Liu and Deqing Fu and Peng Guo and Wang Bill Zhu and Vatsal Sharan and Robin Jia and Willie Neiswanger and Furong Huang and Tom Goldstein and Micah Goldblum}, year={2025}, eprint={2507.16746}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.16746}, }
提供机构:
maas
创建时间:
2025-08-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作