program-cota-llava
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/program-cota-llava
下载链接
链接失效反馈官方服务:
资源简介:
# 🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action
<h3 align="left"> <a href="https://taco-project.github.io/">🌐 Website</a> | <a href="https://arxiv.org/pdf/2412.05479">📑 Arxiv</a> | <a href="https://github.com/SalesforceAIResearch/CoTA">💻 Code</a>| <a href="https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4">🤗 Datasets</a>
<h5 align="left"> If you like our project or are interested in its updates, please star us :) Thank you! ⭐ </h2>
## Summary
TLDR: CoTA is a large-scale dataset of synthetic Chains-of-Thought-and-Action (CoTA) generated by programs.
## Load data
```
from datasets import load_dataset
dataset = load_dataset("Salesforce/program-cota-llava", split="program_cota_mc_970k")
```
## Dataset Card
### Dataset Details
This dataset contains synthetic chains of thoughts and actions.
### Uses
<!-- Address questions around how the dataset is intended to be used. -->
The intended use of this dataset is to finetune multi-modal language models to produce chains of thoughts and actions to answer difficult and complex visual questions.
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
You can directly use this dataset to train LLaVA-OneVision-based models with our [codebase](https://github.com/SalesforceAIResearch/TACO). To train Mantis models, please use ```program-cota-mantis``` in the [collection](https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4).
To train other multi-modal language models, you might need to adapt the conversation format to work for your particular models.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
This dataset should not be used for testing models.
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
The source data comes from [Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and [Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct).
They are collected from various existing datasets, including COCO, AOKVQA, ScienceQA, Visual Genome, etc.
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
<img src="data_gen.png" width=1000>
<!--  -->
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Our dataset has the following limitations:
- The chains of thoughts and actions are generated by gpt-4o-2024-08-06 and thus inherit its biases;
- The actions are somewhat limited as they cover mostly vision-centric tools such as DepthEstimation and some generic tools such as QueryKnowledgeBase.
- Please refer to the paper for additional limitations.
## License
The CoTA datasets are licensed under the noncommerical license [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper.
## Citation
```
@misc{ma2024tacolearningmultimodalaction,
title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action},
author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese},
year={2024},
eprint={2412.05479},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05479},
}
```
# 🌮 TACO:基于合成思维-行动链的多模态行动模型学习
<h3 align="left"> <a href="https://taco-project.github.io/">🌐 官方网站</a> | <a href="https://arxiv.org/pdf/2412.05479">📑 arXiv预印本</a> | <a href="https://github.com/SalesforceAIResearch/CoTA">💻 代码仓库</a>| <a href="https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4">🤗 数据集集合</a>
<h5 align="left"> 如果您喜爱本项目或关注其后续更新,欢迎为我们点亮Star ⭐,感谢您的支持!</h5>
## 摘要
### 核心要点:CoTA是由程序生成的大规模合成思维-行动链(Chains-of-Thought-and-Action,简称CoTA)数据集。
## 数据加载
python
from datasets import load_dataset
dataset = load_dataset("Salesforce/program-cota-llava", split="program_cota_mc_970k")
## 数据集卡片
### 数据集详情
本数据集包含合成生成的思维-行动链。
### 数据集用途
<!-- 说明数据集的预期使用场景。 -->
本数据集的设计用途为:微调多模态大语言模型,使其能够生成思维-行动链,以解答复杂且高难度的视觉问答问题。
### 直接使用场景
<!-- 本节描述本数据集适配的各类使用案例。 -->
您可直接使用本数据集,结合我们提供的[代码仓库](https://github.com/SalesforceAIResearch/TACO),训练基于LLaVA-OneVision的模型。若需训练Mantis模型,请使用数据集集合中的`program-cota-mantis`。若需训练其他多模态大语言模型,您可能需要根据目标模型的特性调整对话格式。
### 不适配使用场景
<!-- 本节说明误用、恶意使用,以及本数据集无法良好适配的使用场景。 -->
本数据集不可用于模型测试。
### 源数据来源
<!-- 本节描述源数据的构成,例如新闻文本与标题、社交媒体帖文、译句等。 -->
本数据集的源数据来自[Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron)与[Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct),其采集自多个现有数据集,包括COCO、AOKVQA、ScienceQA、Visual Genome等。
#### 数据采集与处理流程
<!-- 本节描述数据采集与处理的全过程,包括数据筛选标准、过滤与归一化方法、所使用的工具与库等。 -->
<img src="data_gen.png" width=1000>
<!--  -->
## 偏差、风险与局限性
<!-- 本节旨在说明技术与社会技术层面的局限性。 -->
本数据集存在以下局限性:
1. 思维-行动链由`gpt-4o-2024-08-06`生成,因此会继承该模型的固有偏差;
2. 所覆盖的行动类型相对有限,主要涵盖以视觉为中心的工具(如深度估计),以及少量通用工具(如查询知识库);
3. 更多局限性细节请参考原论文。
## 授权协议
CoTA数据集采用非商业授权协议[CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)。使用者需自行评估原数据集相关授权协议或条款下的义务与责任。本数据集仅用于支持学术论文发表的研究用途。
## 引用格式
bibtex
@misc{ma2024tacolearningmultimodalaction,
title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action},
author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese},
year={2024},
eprint={2412.05479},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.05479},
}
提供机构:
maas
创建时间:
2025-08-16



