llava-critic-113k

Name: llava-critic-113k
Creator: maas
Published: 2025-12-03 17:06:38
License: 暂无描述

魔搭社区2025-12-03 更新2024-11-23 收录

下载链接：

https://modelscope.cn/datasets/lmms-lab/llava-critic-113k

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for LLaVA-Critic-113k - 🪐 Project Page: https://llava-vl.github.io/blog/2024-10-03-llava-critic/ - 📰 Paper: https://arxiv.org/abs/2410.02712 - 🤗 Huggingface Collection: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8 - 👋 Point of Contact: [Tianyi Xiong](https://tyxiong23.github.io/) ## Dataset Summary LLaVA-Critic-113k is a high quality **critic instruction-following dataset** tailored to follow instructions in complex evaluation setting, providing both **quantitative judgments** and the **corresponding reasoning process**. It consists of 46k images with 113k evaluation instruction samples, primarily including two evaluation settings: - Pointwise Scoring: Assign a score to an individual candidate response. We collect instrucion-response pairs across 8 multimodal datasets and 13 response models, gather evaluation prompts from 7 open-ended benchmarks, and utilize GPT-4o to produce judgment scores and reasons. *Data Format* (`Input` + Output): `Image`, `Question`, `Response`, `Reference(optional)`, `Evaluation Criteria`, Score, Reason - Pairwise Ranking: Compare two candidate responses to determine their relative quality. We gather pairwise responses with known preferences, design a set of 30 pairwise evaluation prompt templates, and ask GPT-4o to generate justification for the preference. *Data Format* (`Input` + Output): `Image`, `Question`, `Response 1&2`, `Evaluation Criteria`, Preference, Reason ### Data Statistics <img src="./data_statistics_critic_detail.png" width="750px"/> ### Example Data <img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/> ## Citation ``` @article{xiong2024llavacritic, title={LLaVA-Critic: Learning to Evaluate Multimodal Models}, author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan}, year={2024}, eprint={2410.02712}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.02712}, } ```

# LLaVA-Critic-113k 数据集卡片 - 🪐 项目主页: https://llava-vl.github.io/blog/2024-10-03-llava-critic/ - 📰 论文: https://arxiv.org/abs/2410.02712 - 🤗 Huggingface 合集: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8 - 👋 联系方式: [Tianyi Xiong](https://tyxiong23.github.io/) ## 数据集概述 LLaVA-Critic-113k是一款高质量的**评判式指令遵循数据集（critic instruction-following dataset）**，专为复杂评估场景下的指令遵循需求打造，可同时提供**量化评判结果**与**对应推理过程**。该数据集包含4.6万张图像与11.3万条评估指令样本，主要涵盖两类评估场景： - 逐点评分（Pointwise Scoring）：为单个候选回复分配评分。我们从8个多模态数据集与13个回复模型中采集指令-回复对，从7个开放式基准数据集收集评估提示词，并利用GPT-4o生成评判评分与推理理由。 *数据格式*（`输入` + 输出）： `图像`、`问题`、`回复`、`参考回复（可选）`、`评估准则`、`评分`、`推理理由` - 成对排序（Pairwise Ranking）：对比两份候选回复以判定其相对质量。我们收集带有已知偏好的成对回复，设计30套成对评估提示词模板，并要求GPT-4o生成偏好判定的推理依据。 *数据格式*（`输入` + 输出）： `图像`、`问题`、`回复1与回复2`、`评估准则`、`偏好结果`、`推理理由` ### 数据统计 <img src="./data_statistics_critic_detail.png" width="750px"/> ### 示例数据 <img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/> ## 引用 @article{xiong2024llavacritic, title={LLaVA-Critic: Learning to Evaluate Multimodal Models}, author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan}, year={2024}, eprint={2410.02712}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.02712}, }

提供机构：

maas

创建时间：

2024-10-06

搜集汇总

数据集介绍