CogAlign

Name: CogAlign
Creator: maas
Published: 2025-12-05 11:51:44
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/Salesforce/CogAlign

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for CogAlign - [Dataset Description](https://huggingface.co/datasets/Salesforce/CogAlign/blob/main/README.md#dataset-description) - [Citation](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#citation) ## Dataset Description **CogAlign** is a post-training strategy for Vision Language Models (VLMs) aimed at enhancing their visual arithmetic capabilities. This repository presents the training data for CogAlign, a synthetic dataset containing 64,000 examples designed to facilitate this post-training process. CogAlign is inspired by Piaget's theory of cognitive development and focuses on improving a VLM's understanding of conservation and decentration. Each example includes a visual input, a query prompting comparison of a specific property, a positive response consistent with the visual input, and a negative response that contradicts it. Training VLMs with CogAlign leads to performance improvements in downstream tasks that rely on visual arithmetic, specifically: - **Chart Understanding**: When used to train VLMs, CogAlign leads to an average performance increase of 4.6% on the [CHOCOLATE](https://arxiv.org/abs/2312.10160) chart understanding dataset. - **Geometric Problem-Solving**: Models trained with CogAlign exhibit an average performance gain of 2.9% on the subset of [MATH-VISION](https://arxiv.org/abs/2402.14804) dataset that focuses on geometry-related questions. This dataset allows VLMs to learn fundamental visual arithmetic, leading to better performance in tasks involving visual arithmetic. Importantly, CogAlign has been shown to achieve comparable or even better performance than task-specific SFT methods, while requiring significantly less (60%) training data. ## Citation If you find CogAlign useful in your research, please consider citing: ``` @misc{huang-etal-2025-cogalign, title = "Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding", author = "Huang, Kung-Hsiang and Qin, Can and Qiu, Haoyi and Laban, Philippe and Joty, Shafiq and Xiong, Caiming and Wu, Chien-Sheng", year = "2025", archivePrefix = "arXiv", primaryClass={cs.AI} } ```

# CogAlign 数据集卡片 - [数据集描述](https://huggingface.co/datasets/Salesforce/CogAlign/blob/main/README.md#dataset-description) - [引用](https://huggingface.co/datasets/khhuang/CHOCOLATE/blob/main/README.md#citation) ## 数据集描述 **CogAlign** 是面向视觉语言模型（Vision Language Models, VLMs）的一种后训练策略，旨在提升其视觉算术能力。本仓库提供了CogAlign的训练数据，这是一个包含64000条样本的合成数据集，用于助力该后训练流程。 CogAlign的设计灵感源自皮亚杰的认知发展理论，聚焦于提升视觉语言模型对守恒与去中心化两个核心概念的理解。每条样本均包含视觉输入、用于引导特定属性比较的查询语句、与视觉输入相符的正向应答，以及与之相悖的负向应答。通过CogAlign训练视觉语言模型，可在依赖视觉算术的下游任务中实现性能提升，具体包括： - **图表理解**：使用CogAlign训练的模型，在[CHOCOLATE](https://arxiv.org/abs/2312.10160)图表理解数据集上的平均性能提升4.6%。 - **几何问题求解**：经CogAlign训练的模型，在[MATH-VISION](https://arxiv.org/abs/2402.14804)数据集的几何相关子集上，平均性能提升2.9%。本数据集可帮助视觉语言模型学习基础的视觉算术能力，进而在涉及视觉算术的各类任务中获得更优表现。值得注意的是，研究表明CogAlign可取得与任务专属监督微调（Supervised Fine-Tuning, SFT）方法相当甚至更优的性能，同时仅需其60%的训练数据量。 ## 引用若您在研究中用到CogAlign，敬请引用以下文献： @misc{huang-etal-2025-cogalign, title = "Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding", author = "Huang, Kung-Hsiang and Qin, Can and Qiu, Haoyi and Laban, Philippe and Joty, Shafiq and Xiong, Caiming and Wu, Chien-Sheng", year = "2025", archivePrefix = "arXiv", primaryClass={cs.AI} }

提供机构：

maas

创建时间：

2025-08-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集