five

llava-critic-113k

收藏
魔搭社区2025-12-03 更新2024-11-23 收录
下载链接:
https://modelscope.cn/datasets/lmms-lab/llava-critic-113k
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for LLaVA-Critic-113k - 🪐 Project Page: https://llava-vl.github.io/blog/2024-10-03-llava-critic/ - 📰 Paper: https://arxiv.org/abs/2410.02712 - 🤗 Huggingface Collection: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8 - 👋 Point of Contact: [Tianyi Xiong](https://tyxiong23.github.io/) ## Dataset Summary LLaVA-Critic-113k is a high quality **critic instruction-following dataset** tailored to follow instructions in complex evaluation setting, providing both **quantitative judgments** and the **corresponding reasoning process**. It consists of 46k images with 113k evaluation instruction samples, primarily including two evaluation settings: - <span style="color:red"><b>Pointwise Scoring</b>: Assign a score to an individual candidate response.</span> We collect instrucion-response pairs across 8 multimodal datasets and 13 response models, gather evaluation prompts from 7 open-ended benchmarks, and utilize GPT-4o to produce judgment scores and reasons. *Data Format* (`Input` + <span style="color:green">Output</span>): `Image`, `Question`, `Response`, `Reference(optional)`, `Evaluation Criteria`, <span style="color:green">Score</span>, <span style="color:green">Reason</span> - <span style="color:blue"><b>Pairwise Ranking</b>: Compare two candidate responses to determine their relative quality.</span> We gather pairwise responses with known preferences, design a set of 30 pairwise evaluation prompt templates, and ask GPT-4o to generate justification for the preference. *Data Format* (`Input` + <span style="color:green">Output</span>): `Image`, `Question`, `Response 1&2`, `Evaluation Criteria`, <span style="color:green">Preference</span>, <span style="color:green">Reason</span> ### Data Statistics <img src="./data_statistics_critic_detail.png" width="750px"/> ### Example Data <img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/> ## Citation ``` @article{xiong2024llavacritic, title={LLaVA-Critic: Learning to Evaluate Multimodal Models}, author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan}, year={2024}, eprint={2410.02712}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.02712}, } ```

# LLaVA-Critic-113k 数据集卡片 - 🪐 项目主页: https://llava-vl.github.io/blog/2024-10-03-llava-critic/ - 📰 论文: https://arxiv.org/abs/2410.02712 - 🤗 Huggingface 合集: https://huggingface.co/collections/lmms-lab/llava-critic-66fe3ef8c6e586d8435b4af8 - 👋 联系方式: [Tianyi Xiong](https://tyxiong23.github.io/) ## 数据集概述 LLaVA-Critic-113k是一款高质量的**评判式指令遵循数据集(critic instruction-following dataset)**,专为复杂评估场景下的指令遵循需求打造,可同时提供**量化评判结果**与**对应推理过程**。该数据集包含4.6万张图像与11.3万条评估指令样本,主要涵盖两类评估场景: - <span style="color:red"><b>逐点评分(Pointwise Scoring)</b></span>:为单个候选回复分配评分。 我们从8个多模态数据集与13个回复模型中采集指令-回复对,从7个开放式基准数据集收集评估提示词,并利用GPT-4o生成评判评分与推理理由。 *数据格式*(`输入` + <span style="color:green">输出</span>): `图像`、`问题`、`回复`、`参考回复(可选)`、`评估准则`、<span style="color:green">`评分`</span>、<span style="color:green">`推理理由`</span> - <span style="color:blue"><b>成对排序(Pairwise Ranking)</b></span>:对比两份候选回复以判定其相对质量。 我们收集带有已知偏好的成对回复,设计30套成对评估提示词模板,并要求GPT-4o生成偏好判定的推理依据。 *数据格式*(`输入` + <span style="color:green">输出</span>): `图像`、`问题`、`回复1与回复2`、`评估准则`、<span style="color:green">`偏好结果`</span>、<span style="color:green">`推理理由`</span> ### 数据统计 <img src="./data_statistics_critic_detail.png" width="750px"/> ### 示例数据 <img src="https://llava-vl.github.io/blog/2024-10-03-llava-critic/static/images/example_critic_data.png" width="750px"/> ## 引用 @article{xiong2024llavacritic, title={LLaVA-Critic: Learning to Evaluate Multimodal Models}, author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan}, year={2024}, eprint={2410.02712}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.02712}, }
提供机构:
maas
创建时间:
2024-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作