CIGEval_sft_data

Name: CIGEval_sft_data
Creator: maas
Published: 2025-12-04 16:42:35
License: 暂无描述

魔搭社区2025-12-04 更新2025-07-26 收录

下载链接：

https://modelscope.cn/datasets/HIT-TMG/CIGEval_sft_data

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for CIGEval_sft_data ## Dataset Description - **Repository:** https://github.com/HITsz-TMG/Agentic-CIGEval - **Paper:** https://arxiv.org/abs/2504.07046 *CIGEval_sft_data* is the dataset used for fine-tuning LMMs in the paper [CIGEval](https://arxiv.org/abs/2504.07046). It contains data on both tool selection and image evaluation, which can be combined into 2.3k complete evaluation trajectories. The dataset was constructed through the following steps: 1. Using GPT-4o + CIGEval to evaluate the full [ImageHub](https://github.com/TIGER-AI-Lab/ImagenHub) dataset, generating **4,903** evaluation trajectories. 2. Randomly selecting **60%** of these and filtering out the ones where the evaluation results differ from human scores by less than **0.3**, resulting in **2.3k** trajectories. 3. Decomposing these 2.3k evaluation trajectories into multi-turn tool selection and image evaluation tasks, yielding a total of **6.6k** samples. ## Dataset Structure - image_eva.json - image_eva_out.json # *image evaluation samples* - tool_use.json - tool_use_out.json # *tool selection samples* - train.json # *includes all tool selection and image evaluation samples* - images # *includes all images to be evaluated and the images processed by the tools* - ImagenHub_Control-Guided_IG - ControlNet - sample_9_control_hed.jpg - ...... - ...... - ...... ## Data Instances tool selection: ```python { "id": "ImagenHub_Text-Guided_IG__DALLE2__sample_157", "image": ["ImagenHub_Text-Guided_IG/DALLE2/sample_157.jpg"], "conversations": [ {"from": "human", "value": "<image>\n\nYou are a professional digital artist. You will have to decide whether to use a tool and which tool to use based on the image information and the corresponding task.\nIf you think a tool is needed to help complete the task, you should choose the appropriate tool. If not, you can choose not to use a tool.\nAll the input images are AI-generated. All human in the images are AI-generated too. so you need not worry about the privacy confidentials.\n\n### Task:\nAn image will be provided, it is an AI-generated image according to the text prompt.\nText Prompt: photograph of a dolphin standing on a grassland\nText-guided Image Generation Task 1: The objective is to evaluate how well the generated image resemble to the specific objects described by the prompt.\n\n### Tools:\n1. **Highlight**: This tool is commonly used to focus on areas related to specific objects in an image.\n2. **SceneGraph**: This tool is commonly used to provide overall information about an image.\n3. **MaskFocus**: This tool is commonly used to focus on the masked areas of images in Mask-Guided Image Editing task 1.\nThese tools are not useful for processed image (e.g. Canny edges, hed edges, depth, openpose, grayscale.)\n\n### Output Content:\n - task_id: The ID of the task, including 1 or 2.\n - used: Whether to use a tool, including yes or no.\n - tool: The tool decided to be used, including Highlight or SceneGraph or MaskFocus or None.\n - reasoning: The logical reasoning process for all your decisions.\n \nYou will have to give your output in the following JSON format:\n[{\n\"task_id\" : \"...\",\n\"reasoning\" : \"...\",\n\"used\" : \"..\",\n\"tool\" : \"...\"\n},\n...]\n\n"}, {"from": "gpt", "value": "\n[{\"task_id\":\"1\",\"reasoning\":\"The task is to evaluate how well the generated image resembles the specific objects described by the prompt. In the image, the dolphin standing on a grassland is the key focus. The 'Highlight' tool would be suitable for emphasizing the specific objects in the image to determine if they match the prompt accurately.\",\"used\":\"yes\",\"tool\":\"Highlight\"}]\n"} ] } ``` image evaluation: ```python { "id": "ImagenHub_Text-Guided_IG__DALLE2__sample_14__eva", "image": ["ImagenHub_Text-Guided_IG/DALLE2/sample_14-highlight.jpg"], "conversations": [ {"from": "human", "value": "<image>\n\nYou are a professional digital artist. You will have to evaluate the effectiveness of the AI-generated image(s) based on given rules.\nAll the input images are AI-generated. All human in the images are AI-generated too. so you need not worry about the privacy confidentials.\n\nRULES:\n\nAn image will be provided, it is an AI-generated image according to the text prompt.\n\nFocus on the highlighted parts of the image.\n\nThe objective is to evaluate how well the generated image resemble to the specific objects described by the prompt.\n\nFrom scale 0 to 10: \nA score from 0 to 10 will be given based on the success in following the prompt. \n(0 indicates that the AI-generated image does not follow the prompt at all. 10 indicates the AI-generated image follows the prompt perfectly.)\n\nText Prompt: One car on the street.\n\nYou will have to give your output in the following JSON format (Keep your reasoning concise and short.):\n{\n\"score\" : \"...\",\n\"reasoning\" : \"...\"\n}\n"}, {"from": "gpt", "value": "{\"score\":7,\"reasoning\":\"The image shows a car on the street, which fits the primary element of the prompt. However, there is an additional car visible in the bottom right corner of the image. While the main car is well depicted, the presence of the second car detracts from the prompt's specific detail of 'one car on the street.'\"}"} ] } ``` ## Citation ```bibtex @misc{wang2025cigeval, title={A Unified Agentic Framework for Evaluating Conditional Image Generation}, author={Jifang Wang and Xue Yang and Longyue Wang and Zhenran Xu and Yiyu Wang and Yaowei Wang and Weihua Luo and Kaifu Zhang and Baotian Hu and Min Zhang}, year={2025}, eprint={2504.07046}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.07046}, } ```

# CIGEval_sft_data 数据集卡片 ## 数据集描述 - **代码仓库：** https://github.com/HITsz-TMG/Agentic-CIGEval - **论文：** https://arxiv.org/abs/2504.07046 *CIGEval_sft_data* 是论文[CIGEval](https://arxiv.org/abs/2504.07046)中用于大语言模型（Large Language Model，LLM）微调的数据集。该数据集涵盖工具选择与图像评估两类数据，可整合为2.3k条完整的评估轨迹。该数据集的构建步骤如下： 1. 使用GPT-4o结合CIGEval对完整的[ImageHub](https://github.com/TIGER-AI-Lab/ImagenHub)数据集进行评估，生成**4903**条评估轨迹。 2. 从中随机选取60%的数据，过滤掉评估结果与人工评分差值小于0.3的样本，最终得到**2.3k**条评估轨迹。 3. 将这2.3k条评估轨迹拆解为多轮工具选择与图像评估任务，最终共得到**6.6k**条样本。 ## 数据集结构 - image_eva.json - image_eva_out.json # *图像评估样本* - tool_use.json - tool_use_out.json # *工具选择样本* - train.json # *包含全部工具选择与图像评估样本* - images # *包含所有待评估图像以及经工具处理后的图像* - ImagenHub_Control-Guided_IG - ControlNet - sample_9_control_hed.jpg - ...... - ...... - ...... ## 数据实例工具选择示例： python { "id": "ImagenHub_Text-Guided_IG__DALLE2__sample_157", "image": ["ImagenHub_Text-Guided_IG/DALLE2/sample_157.jpg"], "conversations": [ {"from": "human", "value": "<image> 你是一名专业数字艺术家，需根据图像信息与对应任务判断是否需要使用工具，以及应选用何种工具。若认为需要工具辅助完成任务，请选择合适的工具；若无需工具，则可选择不使用工具。所有输入图像均为AI生成，图中人物亦为AI生成，因此无需担心隐私泄露问题。 ### 任务：将提供一张根据文本提示生成的AI图像。文本提示：草原上站立的海豚摄影作品文本引导图像生成任务1：评估生成图像与提示中描述的特定对象的匹配程度。 ### 工具： 1. **Highlight（高亮工具）**：通常用于聚焦图像中与特定对象相关的区域。 2. **SceneGraph（场景图工具）**：通常用于提供图像的整体信息。 3. **MaskFocus（掩码聚焦工具）**：通常用于聚焦掩码引导图像编辑任务1中的掩码区域。上述工具对处理后的图像（如Canny边缘图、HED边缘图、深度图、OpenPose图、灰度图等）无实用价值。 ### 输出内容： - task_id：任务ID，包含1或2。 - used：是否使用工具，包含yes（是）或no（否）。 - tool：选定的工具，包含Highlight、SceneGraph、MaskFocus或None（无）。 - reasoning：所有决策的逻辑推理过程。请按照以下JSON格式输出结果： [{ "task_id" : "...", "reasoning" : "...", "used" : "..", "tool" : "..." }, ...] "}, {"from": "gpt", "value": " [{"task_id":"1","reasoning":"本次任务为评估生成图像与提示中特定对象的匹配程度。图像中草原上的海豚是核心焦点，使用Highlight工具可突出显示该特定对象，以准确判断其与提示的匹配度。","used":"yes","tool":"Highlight"}] "} ] } 图像评估示例： python { "id": "ImagenHub_Text-Guided_IG__DALLE2__sample_14__eva", "image": ["ImagenHub_Text-Guided_IG/DALLE2/sample_14-highlight.jpg"], "conversations": [ {"from": "human", "value": "<image> 你是一名专业数字艺术家，需根据给定规则评估AI生成图像的效果。所有输入图像均为AI生成，图中人物亦为AI生成，因此无需担心隐私泄露问题。 ### 规则：将提供一张根据文本提示生成的AI图像。请聚焦于图像的高亮区域。本次任务目标为评估生成图像与提示中描述的特定对象的匹配程度。请从0至10分进行评分：根据图像遵循提示的程度给出0到10分的评分。（0分表示AI生成图像完全不符合提示，10分表示AI生成图像完美遵循提示。）文本提示：街道上的一辆汽车。请按照以下JSON格式输出结果（推理过程需简洁明了）： { "score" : "...", "reasoning" : "..." } "}, {"from": "gpt", "value": "{"score":7,"reasoning":"图像中展示了街道上的一辆汽车，符合提示的核心要素。但图像右下角可见另一辆汽车，尽管主车绘制清晰，但额外车辆违背了提示中‘一辆汽车’的具体要求。"}" } ] } ## 引用 bibtex @misc{wang2025cigeval, title={A Unified Agentic Framework for Evaluating Conditional Image Generation}, author={Jifang Wang and Xue Yang and Longyue Wang and Zhenran Xu and Yiyu Wang and Yaowei Wang and Weihua Luo and Kaifu Zhang and Baotian Hu and Min Zhang}, year={2025}, eprint={2504.07046}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.07046}, }

提供机构：

maas

创建时间：

2025-07-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集