VisuLogic|多模态理解数据集|逻辑推理数据集

github2025-04-10 更新2025-04-09 收录

多模态理解

逻辑推理

下载链接：

https://github.com/VisuLogic-Benchmark/VisuLogic-Eval

下载链接

链接失效反馈

资源简介：

第一个将视觉感知与逻辑推理相结合的基准，包含1,000个精心设计的问题，涵盖6个领域和23个子类别，旨在避免语言偏见，确保任务依赖真正的视觉推理。

创建时间：

2025-04-07

原始信息汇总

VisuLogic 数据集概述

基本信息

数据集名称: VisuLogic
发布日期: 2025-04-08
维护团队: VisuLogic-Benchmark
联系方式:
- Jiahao Wang: wjhwdscience@stu.xjtu.edu.cn
- Weiye Xu: ustcxwy0271@mail.ustc.edu.cn

数据集特点

核心挑战: 首个整合视觉感知与逻辑推理的多模态评估基准
严谨设计: 包含6个领域、24个子类别的1,000个精心设计的问题
抗语言捷径: 需要真正的多模态理解的视觉中心推理任务
人类对齐评估:
- 人类准确率: >50.0%
- SOTA MLLMs准确率: <30%

数据集内容

数据规模: 1,000个问题
领域覆盖: 6个主要领域
子类别: 24个

获取方式

Hugging Face数据集: https://huggingface.co/datasets/VisuLogic/VisuLogic
GitHub仓库: https://github.com/VisuLogic-Benchmark/VisuLogic-Eval.git

评估方法

环境准备: bash git clone https://github.com/VisuLogic-Benchmark/VisuLogic-Eval.git pip install -r requirements.txt
运行评估: bash cd scripts bash eval_qwen2.5vl_7b_multi.sh

引用格式

bibtex @misc{visulogic, title = {VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models}, author = {VisuLogic-Benchmark}, howpublished = {url{https://github.com/VisuLogic-Benchmark/VisuLogic-Eval}}, year = {2025}, note = {Accessed: 2025-04-08} }

待发布内容

[ ] 训练代码
[ ] 研究论文
[ ] 训练数据集
[ ] 模型检查点

AI搜集汇总

数据集介绍

构建方式

VisuLogic数据集的构建过程体现了多模态推理评估的前沿理念，研究团队通过跨学科协作精心设计了涵盖6大领域、24个子类别的1,000个视觉逻辑问题。每个问题都经过严格的视觉中心化处理，确保任务必须依赖真实的视觉理解而非语言捷径。数据集采用层次化标注体系，所有样本均通过专家验证并匹配人类认知基准，其中人工标注准确率超过50%，为评估多模态大语言模型提供了可靠标准。

特点

该数据集开创性地融合视觉感知与逻辑推理，其核心价值在于突破传统文本主导的评估范式。问题设计强调视觉依赖性，当前最先进的多模态大语言模型在此基准上的准确率不足30%，显著区别于人类表现。数据集具有精细的领域划分和难度梯度，每个样本均包含视觉线索与逻辑约束的复杂交互，为模型的多模态理解能力提供了多维度的评估标尺。

使用方法

研究者可通过官方GitHub仓库获取标准化评估工具链，系统要求Python环境及指定依赖库。评估流程采用模块化设计，用户既可运行预设的模型测试脚本，也能自定义评估参数。典型使用场景包括：克隆代码库后安装依赖项，进入scripts目录执行对应模型的评估脚本，如针对Qwen2.5-VL-Instruct模型运行预配置的bash脚本。所有评估结果将自动对齐人类表现基准线，支持细粒度的能力维度分析。

背景与挑战

背景概述

VisuLogic数据集由VisuLogic-Benchmark团队于2025年推出，旨在评估多模态大语言模型在视觉推理任务中的表现。该数据集由西安交通大学和中国科学技术大学的研究人员联合开发，聚焦于视觉感知与逻辑推理的交叉领域，填补了现有基准测试在真实多模态理解评估上的空白。其核心研究问题在于探究模型如何整合视觉信息与逻辑规则进行复杂推理，对推动多模态人工智能发展具有重要意义。数据集包含6大领域24个子类的1000个精心设计的问题，已成为衡量模型视觉逻辑能力的重要标准。

当前挑战

VisuLogic数据集面临双重挑战：在领域问题层面，当前最先进的多模态大语言模型准确率不足30%，远低于人类50%的表现，反映出视觉逻辑推理这一核心任务的复杂性；在构建过程中，研究人员需克服视觉中心化任务设计、抗语言捷径等难题，确保每道题目都要求真正的多模态理解。数据集的创建还涉及跨领域知识整合、人类认知对齐等挑战，这些问题共同构成了推动多模态推理研究的关键瓶颈。

常用场景

经典使用场景

在人工智能领域，多模态大语言模型（MLLMs）的视觉推理能力评估一直是一个关键挑战。VisuLogic数据集通过整合视觉感知与逻辑推理，为研究者提供了一个全面评估模型在复杂视觉场景下推理能力的基准平台。该数据集包含1,000个精心设计的问题，涵盖6个领域和24个子类别，特别强调视觉中心化的推理任务，有效避免了语言捷径的影响。

衍生相关工作

VisuLogic数据集的发布催生了一系列关于多模态推理的深入研究。基于该基准，研究者们开发了多种创新模型架构和训练方法，如视觉-语言联合注意力机制、跨模态知识蒸馏等技术。这些工作显著提升了模型在视觉推理任务中的表现，同时也推动了评估标准的发展，形成了更全面、更接近人类认知水平的评价体系。

数据集最近研究

最新研究方向

在人工智能多模态学习领域，视觉与逻辑推理的深度融合正成为前沿探索的重要方向。VisuLogic基准测试集的推出填补了当前多模态大语言模型在视觉中心化逻辑推理评估方面的空白。该数据集通过精心设计的1000道跨领域问题，构建了涵盖6大领域24个子类的复杂评估体系，其反语言捷径特性要求模型必须真正理解视觉内容与逻辑关系的内在关联。值得注意的是，人类在该测试集上的准确率超过50%，而当前最先进的多模态大语言模型表现仍不足30%，这一显著差距揭示了该领域亟待突破的技术瓶颈。随着训练代码和模型检查点的陆续发布，VisuLogic有望推动多模态模型在医疗诊断、自动驾驶等需要精细视觉推理的关键场景中的应用突破。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

LFW

人脸数据集;LFW数据集共有13233张人脸图像，每张图像均给出对应的人名，共有5749人，且绝大部分人仅有一张图片。每张图片的尺寸为250X250，绝大部分为彩色图像，但也存在少许黑白人脸图片。 URL: http://vis-www.cs.umass.edu/lfw/index.html#download

AI_Studio 收录

VQA

我们提出了自由形式和开放式视觉问答 (VQA) 的任务。给定图像和关于图像的自然语言问题，任务是提供准确的自然语言答案。反映许多现实世界的场景，例如帮助视障人士，问题和答案都是开放式的。视觉问题有选择地针对图像的不同区域，包括背景细节和底层上下文。因此，与生成通用图像说明的系统相比，在 VQA 上取得成功的系统通常需要对图像和复杂推理有更详细的理解。此外，VQA 适合自动评估，因为许多开放式答案仅包含几个单词或一组封闭的答案，可以以多项选择的形式提供。我们提供了一个数据集包含 100，000 的图像和问题并讨论它提供的信息。提供了许多 VQA 基线，并与人类表现进行了比较。

OpenDataLab 收录

TM-Senti

TM-Senti是由伦敦玛丽女王大学开发的一个大规模、远距离监督的Twitter情感数据集，包含超过1.84亿条推文，覆盖了超过七年的时间跨度。该数据集基于互联网档案馆的公开推文存档，可以完全重新构建，包括推文元数据且无缺失推文。数据集内容丰富，涵盖多种语言，主要用于情感分析和文本分类等任务。创建过程中，研究团队精心筛选了表情符号和表情，确保数据集的质量和多样性。该数据集的应用领域广泛，旨在解决社交媒体情感表达的长期变化问题，特别是在表情符号和表情使用上的趋势分析。

arXiv 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

中国农村金融统计数据

该数据集包含了中国农村金融的统计信息，涵盖了农村金融机构的数量、贷款余额、存款余额、金融服务覆盖率等关键指标。数据按年度和地区分类，提供了详细的农村金融发展状况。