five

Accio-Lab/Metis-ColdStart

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Accio-Lab/Metis-ColdStart
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - visual-question-answering - image-text-to-text language: - en tags: - multimodal - tool-use - agentic - sft - vision-language - meta-cognitive size_categories: - 10K<n<100K --- # Metis-ColdStart **Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models** Metis-ColdStart is the **supervised fine-tuning (SFT) dataset** used to train the [Metis-8B-ColdStart](https://huggingface.co/Accio-Lab/Metis-8B-ColdStart) model. It contains ~27K high-quality, tool-augmented multimodal reasoning trajectories that have been rigorously curated to ensure genuine tool necessity and reasoning quality. [[Paper (arXiv)]](https://arxiv.org/abs/2604.08545) | [[GitHub]](https://github.com/Accio-Lab/Metis) | [[ColdStart Model]](https://huggingface.co/Accio-Lab/Metis-8B-ColdStart) | [[RL Model]](https://huggingface.co/Accio-Lab/Metis-8B-RL) | [[RL Data]](https://huggingface.co/datasets/Accio-Lab/Metis-RL) ## Dataset Details | Attribute | Value | |---|---| | Size | ~26.8K samples | | Format | Parquet | | Modalities | Text + Image | | Purpose | Supervised fine-tuning (Cold Start) for agentic multimodal tool use | | License | Apache-2.0 | ## Data Curation Pipeline A key contribution of Metis is the rigorous three-stage curation pipeline that transforms raw tool-augmented trajectories into high-quality SFT data: ### Stage 1: Eradicating Hallucinated Environmental Dynamics Raw trajectories from existing datasets often contain **hallucinated tool outputs** — the model imagines plausible but incorrect execution results. We execute all code snippets in a sandboxed environment and **discard** any trajectory where execution fails or produces results inconsistent with the original trajectory. ### Stage 2: Isolating Genuine Tool Necessity Many trajectories invoke tools unnecessarily for problems the base model can solve directly. We filter out samples where **Qwen3-VL-8B-Instruct achieves pass@8 = 1 without any tools**, ensuring the remaining data genuinely requires tool augmentation. ### Stage 3: Multidimensional Meta-Cognitive Filtering An LLM judge evaluates each trajectory along three dimensions: - **Visual relevance** — Does the tool invocation relate to the visual content? - **Reasoning coherence** — Is the reasoning chain logically consistent? - **Tool-use rationale** — Is there a justified reason for each tool call? ### Source Datasets The raw trajectories are drawn from publicly available tool-augmented multimodal datasets: - DeepEyesV2 - V-Interaction - Thyme - OpenMMReasoner ## Usage ```python from datasets import load_dataset dataset = load_dataset("Accio-Lab/Metis-ColdStart", split="train") print(f"Number of samples: {len(dataset)}") print(dataset[0].keys()) ``` ## Training Pipeline ``` Metis-ColdStart (~27K samples) ← (this dataset) │ ▼ SFT Metis-8B-ColdStart │ ▼ HDPO with Metis-RL (~5K prompts) Metis-8B-RL (final model) ``` ## Citation ```bibtex @article{yan2026metis, title={Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models}, author={Yan, Shilin and Tong, Jintao and Xue, Hongwei and Tang, Xiaojun and Wang, Yangyang and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong}, journal={arXiv preprint arXiv:2604.08545}, year={2026} } ``` ## Acknowledgments Metis is built upon [verl](https://github.com/volcengine/verl), [verl-tool](https://github.com/TIGER-AI-Lab/verl-tool), and [Qwen3-VL](https://github.com/QwenLM/Qwen3-VL).
提供机构:
Accio-Lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作