HumanEval-V

Name: HumanEval-V
Creator: Open-sourced by the authors
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/HumanEval-V/HumanEval-V-Benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个专为评估LMMs（指代某种机器学习模型）的视觉理解和推理能力而设计的基准，通过代码生成来实现。它包含了108个入门级的Python编程任务，这些任务融合了视觉元素。这些任务改编自CodeForces和Stack Overflow等平台，并配备了手工制作的测试用例以便进行评估。数据集的规模为108个任务，主要任务是基于视觉上下文和预定义的函数签名进行代码生成。

This dataset is a benchmark specifically designed to evaluate the visual understanding and reasoning capabilities of Large Multimodal Models (LMMs, referring to certain machine learning models), which is constructed via code generation. It includes 108 entry-level Python programming tasks that incorporate visual elements. These tasks are adapted from platforms such as CodeForces and Stack Overflow, and are equipped with hand-crafted test cases for evaluation purposes. With a total of 108 tasks, the core task of this benchmark is to generate code based on visual contexts and predefined function signatures.

提供机构：

Open-sourced by the authors

5,000+

优质数据集

54 个

任务类型

进入经典数据集