five

program-cota-llava|多模态学习数据集|思维链生成数据集

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Salesforce/program-cota-llava
下载链接
链接失效反馈
资源简介:
# 🌮 TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action <h3 align="left"> <a href="https://taco-project.github.io/">🌐 Website</a> | <a href="https://arxiv.org/pdf/2412.05479">📑 Arxiv</a> | <a href="https://github.com/SalesforceAIResearch/CoTA">💻 Code</a>| <a href="https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4">🤗 Datasets</a> <h5 align="left"> If you like our project or are interested in its updates, please star us :) Thank you! ⭐ </h2> ## Summary TLDR: CoTA is a large-scale dataset of synthetic Chains-of-Thought-and-Action (CoTA) generated by programs. ## Load data ``` from datasets import load_dataset dataset = load_dataset("Salesforce/program-cota-llava", split="program_cota_mc_970k") ``` ## Dataset Card ### Dataset Details This dataset contains synthetic chains of thoughts and actions. ### Uses <!-- Address questions around how the dataset is intended to be used. --> The intended use of this dataset is to finetune multi-modal language models to produce chains of thoughts and actions to answer difficult and complex visual questions. ### Direct Use <!-- This section describes suitable use cases for the dataset. --> You can directly use this dataset to train LLaVA-OneVision-based models with our [codebase](https://github.com/SalesforceAIResearch/TACO). To train Mantis models, please use ```program-cota-mantis``` in the [collection](https://huggingface.co/collections/Salesforce/cota-datasets-675333e57dd34a4adc5f3ff4). To train other multi-modal language models, you might need to adapt the conversation format to work for your particular models. ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> This dataset should not be used for testing models. ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> The source data comes from [Cauldron](https://huggingface.co/datasets/HuggingFaceM4/the_cauldron) and [Mantis-Instruct](https://huggingface.co/datasets/TIGER-Lab/Mantis-Instruct). They are collected from various existing datasets, including COCO, AOKVQA, ScienceQA, Visual Genome, etc. #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> <img src="data_gen.png" width=1000> <!-- ![Dataset generation](dataset_gen.png "Dataset generation process") --> ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> Our dataset has the following limitations: - The chains of thoughts and actions are generated by gpt-4o-2024-08-06 and thus inherit its biases; - The actions are somewhat limited as they cover mostly vision-centric tools such as DepthEstimation and some generic tools such as QueryKnowledgeBase. - Please refer to the paper for additional limitations. ## License The CoTA datasets are licensed under the noncommerical license [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Users need to make their own assessment regarding any obligations or responsibilities under the corresponding licenses or terms and conditions pertaining to the original datasets and data. This release is for research purposes only in support of an academic paper. ## Citation ``` @misc{ma2024tacolearningmultimodalaction, title={TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action}, author={Zixian Ma and Jianguo Zhang and Zhiwei Liu and Jieyu Zhang and Juntao Tan and Manli Shu and Juan Carlos Niebles and Shelby Heinecke and Huan Wang and Caiming Xiong and Ranjay Krishna and Silvio Savarese}, year={2024}, eprint={2412.05479}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.05479}, } ```
提供机构:
maas
创建时间:
2025-08-16
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作