llava-critic-113k
收藏魔搭社区2025-12-03 更新2024-11-23 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/llava-critic-113k
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for LLaVA-Critic-113k
- 🪐 Project Page: https://llava-vl.github.io/blog/2024-10-03-llava-critic/
- 📰 Paper: https://arxiv.org/abs/2410.02712
- 🤗 Huggingface Collection: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8
- 👋 Point of Contact: [Tianyi Xiong](https://tyxiong23.github.io/)
## Dataset Summary
LLaVA-Critic-113k is a high quality **critic instruction-following dataset** tailored to follow instructions in complex evaluation setting, providing both **quantitative judgments** and the **corresponding reasoning process**. It consists of 46k images with 113k evaluation instruction samples, primarily including two evaluation settings:
- <span style="color:red"><b>Pointwise Scoring</b>: Assign a score to an individual candidate response.</span>
We collect instrucion-response pairs across 8 multimodal datasets and 13 response models, gather evaluation prompts from 7 open-ended benchmarks, and utilize GPT-4o to produce judgment scores and reasons.
*Data Format* (`Input` + <span style="color:green">Output</span>):
`Image`, `Question`, `Response`, `Reference(optional)`, `Evaluation Criteria`, <span style="color:green">Score</span>, <span style="color:green">Reason</span>
- <span style="color:blue"><b>Pairwise Ranking</b>: Compare two candidate responses to determine their relative quality.</span>
We gather pairwise responses with known preferences, design a set of 30 pairwise evaluation prompt templates, and ask GPT-4o to generate justification for the preference.
*Data Format* (`Input` + <span style="color:green">Output</span>):
`Image`, `Question`, `Response 1&2`, `Evaluation Criteria`, <span style="color:green">Preference</span>, <span style="color:green">Reason</span>
### Data Statistics
<img src="./data_statistics_critic_detail.png" width="750px"/>
### Example Data
<img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/>
## Citation
```
@article{xiong2024llavacritic,
title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
year={2024},
eprint={2410.02712},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.02712},
}
```
# LLaVA-Critic-113k 数据集卡片
- 🪐 项目主页: https://llava-vl.github.io/blog/2024-10-03-llava-critic/
- 📰 论文: https://arxiv.org/abs/2410.02712
- 🤗 Huggingface 合集: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8
- 👋 联系方式: [Tianyi Xiong](https://tyxiong23.github.io/)
## 数据集概述
LLaVA-Critic-113k是一款高质量的**评判式指令遵循数据集(critic instruction-following dataset)**,专为复杂评估场景下的指令遵循需求打造,可同时提供**量化评判结果**与**对应推理过程**。该数据集包含4.6万张图像与11.3万条评估指令样本,主要涵盖两类评估场景:
- <span style="color:red"><b>逐点评分(Pointwise Scoring)</b></span>:为单个候选回复分配评分。
我们从8个多模态数据集与13个回复模型中采集指令-回复对,从7个开放式基准数据集收集评估提示词,并利用GPT-4o生成评判评分与推理理由。
*数据格式*(`输入` + <span style="color:green">输出</span>):
`图像`、`问题`、`回复`、`参考回复(可选)`、`评估准则`、<span style="color:green">`评分`</span>、<span style="color:green">`推理理由`</span>
- <span style="color:blue"><b>成对排序(Pairwise Ranking)</b></span>:对比两份候选回复以判定其相对质量。
我们收集带有已知偏好的成对回复,设计30套成对评估提示词模板,并要求GPT-4o生成偏好判定的推理依据。
*数据格式*(`输入` + <span style="color:green">输出</span>):
`图像`、`问题`、`回复1与回复2`、`评估准则`、<span style="color:green">`偏好结果`</span>、<span style="color:green">`推理理由`</span>
### 数据统计
<img src="./data_statistics_critic_detail.png" width="750px"/>
### 示例数据
<img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/>
## 引用
@article{xiong2024llavacritic,
title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
year={2024},
eprint={2410.02712},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.02712},
}
提供机构:
maas
创建时间:
2024-10-06



