program-cota-llava

Name: program-cota-llava
Creator: maas
Published: 2025-12-05 16:46:41
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/Salesforce/program-cota-llava

下载链接

链接失效反馈

官方服务：

资源简介：

# 🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action <h3 align="left"> <a href="https://taco-project.github.io/">🌐 Website</a> | <a href="https://arxiv.org/pdf/2412.05479">📑 Arxiv</a> | <a href="https://github.com/SalesforceAIResearch/CoTA">💻 Code</a>| <a href="https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4">🤗 Datasets</a> <h5 align="left"> If you like our project or are interested in its updates, please star us :) Thank you! ⭐ </h2> ## Summary TLDR: CoTA is a large-scale dataset of synthetic Chains-of-Thought-and-Action (CoTA) generated by programs. ## Load data ``` from datasets import load_dataset dataset = load_dataset("Salesforce/program-cota-llava", split="program_cota_mc_970k") ``` ## Dataset Card ### Dataset Details This dataset contains synthetic chains of thoughts and actions. ### Uses  The intended use of this dataset is to finetune multi-modal language models to produce chains of thoughts and actions to answer difficult and complex visual questions. ### Direct Use  You can directly use this dataset to train LLaVA-OneVision-based models with our [codebase](https://github.com/SalesforceAIResearch/TACO). To train Mantis models, please use ```program-cota-mantis``` in the [collection](https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4). To train other multi-modal language models, you might need to adapt the conversation format to work for your particular models. ### Out-of-Scope Use  This dataset should not be used for testing models. ### Source Data  The source data comes from [Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and [Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct). They are collected from various existing datasets, including COCO, AOKVQA, ScienceQA, Visual Genome, etc. #### Data Collection and Processing  <img src="data_gen.png" width=1000>  ## Bias, Risks, and Limitations  Our dataset has the following limitations: - The chains of thoughts and actions are generated by gpt-4o-2024-08-06 and thus inherit its biases; - The actions are somewhat limited as they cover mostly vision-centric tools such as DepthEstimation and some generic tools such as QueryKnowledgeBase. - Please refer to the paper for additional limitations. ## License The CoTA datasets are licensed under the noncommerical license [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper. ## Citation ``` @misc{ma2024tacolearningmultimodalaction, title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action}, author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese}, year={2024}, eprint={2412.05479}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05479}, } ```

# 🌮 TACO：基于合成思维-行动链的多模态行动模型学习 <h3 align="left"> <a href="https://taco-project.github.io/">🌐 官方网站</a> | <a href="https://arxiv.org/pdf/2412.05479">📑 arXiv预印本</a> | <a href="https://github.com/SalesforceAIResearch/CoTA">💻 代码仓库</a>| <a href="https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4">🤗 数据集集合</a> <h5 align="left"> 如果您喜爱本项目或关注其后续更新，欢迎为我们点亮Star ⭐，感谢您的支持！</h5> ## 摘要 ### 核心要点：CoTA是由程序生成的大规模合成思维-行动链（Chains-of-Thought-and-Action，简称CoTA）数据集。 ## 数据加载 python from datasets import load_dataset dataset = load_dataset("Salesforce/program-cota-llava", split="program_cota_mc_970k") ## 数据集卡片 ### 数据集详情本数据集包含合成生成的思维-行动链。 ### 数据集用途  本数据集的设计用途为：微调多模态大语言模型，使其能够生成思维-行动链，以解答复杂且高难度的视觉问答问题。 ### 直接使用场景  您可直接使用本数据集，结合我们提供的[代码仓库](https://github.com/SalesforceAIResearch/TACO)，训练基于LLaVA-OneVision的模型。若需训练Mantis模型，请使用数据集集合中的`program-cota-mantis`。若需训练其他多模态大语言模型，您可能需要根据目标模型的特性调整对话格式。 ### 不适配使用场景  本数据集不可用于模型测试。 ### 源数据来源  本数据集的源数据来自[Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron)与[Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct)，其采集自多个现有数据集，包括COCO、AOKVQA、ScienceQA、Visual Genome等。 #### 数据采集与处理流程  <img src="data_gen.png" width=1000>  ## 偏差、风险与局限性  本数据集存在以下局限性： 1. 思维-行动链由`gpt-4o-2024-08-06`生成，因此会继承该模型的固有偏差； 2. 所覆盖的行动类型相对有限，主要涵盖以视觉为中心的工具（如深度估计），以及少量通用工具（如查询知识库）； 3. 更多局限性细节请参考原论文。 ## 授权协议 CoTA数据集采用非商业授权协议[CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)。使用者需自行评估原数据集相关授权协议或条款下的义务与责任。本数据集仅用于支持学术论文发表的研究用途。 ## 引用格式 bibtex @misc{ma2024tacolearningmultimodalaction, title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action}, author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese}, year={2024}, eprint={2412.05479}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05479}, }

提供机构：

maas

创建时间：

2025-08-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集