fivl-instruct
收藏魔搭社区2025-08-15 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/Intel/fivl-instruct
下载链接
链接失效反馈官方服务:
资源简介:
# FiVL-Instruct Dataset
[FiVL: A Frameword for Improved Vision-Language Alignment](path_to_arxiv) introduces grounded datasets for both training and evaluation, building upon existing vision-question-answer and instruction datasets
Each sample in the original datasets was augmented with key expressions, along with their corresponding bounding box indices and segmentation masks within the images.
## Dataset Details
- **Creators**: Intel Labs
- **Version**: 1.0 (Updated: 2024-12-18)
- **License**: CC BY 4.0
- **Number of Training Samples**:
- **Number of Test Samples**:
- **Format**:
### Dataset Description
FiVL-Instruct, is built upon the [LLaVA-1.5-mix-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) instruction tuning dataset, a public vision-language instruction dataset containing 665K structured dialogues between users and GPT. Most interactions begin with a user-provided image, followed by questions related to the visual content, with GPT offering responses, each question-answer pair is referred as a turn.
We augmented the original LLaVA-1.5-mix-665K dataset by integrating the key expressions and their segmentation masks according to the pipeline described below.
We further trained a model [FiVL-VM](https://huggingface.co/Intel/fivl-vm) as describe in our paper.
### Dataset Sources
- **Code**: [Github Repository](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL)
- **Model**: [FiVL-VM](https://huggingface.co/Intel/fivl-vm)
- **Paper**: [FiVL: A Framework for Improved Vision-Language Alignment](arxiv)
- **Project page**: [Website](https://intellabs.github.io/multimodal_cognitive_ai/FiVL/)
## Uses
Examples of usage of this dataset are described in our [repository](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL).
- For training, users can refer to our [methodlogy](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/training/LLaVA)
- For evaluation, we introduced a new measurement of Visual Reliance of models and Benchmarks. Results can be reproduced using our [code](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/evaluation)
- Finally for explainability, our [code](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/xai) will also provide examples of usage.
## Dataset Structure
You will find here, the key expression and their related segmentation masks for the samples of the original dataset as well as the segmentation masks in dataset_grounded.
## Dataset Creation
Our [repository](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/pipeline/augment_dataset) describes how to reproduce and regenerate this dataset. It also provides [details](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/pipeline/method_evaluation) on how to evaluate it
## Evaluation
We evaluated our dataset against manual as well as automatic annotations using LLM-as-a-judge methodology. Results can be found in section 4 of our paper.
## Ethical Considerations
Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.
## Contact Information
**Issues**: For any issues or questions regarding the dataset, please contact the maintainers or open an issue in the dataset repository.
# FiVL-Instruct 数据集
《FiVL:优化视觉-语言对齐的框架》(原论文链接:path_to_arxiv)提出了面向训练与评估的锚定型(grounded)数据集,其构建基于现有的视觉问答与指令数据集。原始数据集中的每个样本均被补充了关键表达式,以及其在图像中对应的边界框索引与分割掩码。
## 数据集详情
- **创作者**:英特尔实验室(Intel Labs)
- **版本**:1.0(更新时间:2024-12-18)
- **许可协议**:CC BY 4.0
- **训练样本数量**:
- **测试样本数量**:
- **数据格式**:
### 数据集描述
FiVL-Instruct 构建自 [LLaVA-1.5-mix-665K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json) 指令微调数据集,这是一个公开的视觉-语言指令数据集,包含66.5万组用户与GPT的结构化对话。绝大多数交互以用户提供的图像为起点,随后是与视觉内容相关的提问,由GPT给出应答,每组问答对被称为一个"轮次(turn)"。
我们通过下述流程,为原始的LLaVA-1.5-mix-665K数据集补充了关键表达式及其分割掩码。
我们还按照论文中的方法训练了模型 [FiVL-VM](https://huggingface.co/Intel/fivl-vm)。
### 数据集来源
- **代码仓库**:[GitHub 仓库](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL)
- **模型**:[FiVL-VM](https://huggingface.co/Intel/fivl-vm)
- **论文**:[《FiVL:优化视觉-语言对齐的框架》](arxiv)
- **项目主页**:[官方网站](https://intellabs.github.io/multimodal_cognitive_ai/FiVL/)
## 使用场景
该数据集的使用示例可参考我们的 [代码仓库](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL)。
- 训练场景:用户可参考我们的 [训练方法](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/training/LLaVA)
- 评估场景:我们提出了一种全新的模型视觉依赖度评测指标与基准测试集。相关结果可通过我们的 [评估代码](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/evaluation) 复现
- 可解释性场景:我们的 [可解释性代码](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/xai) 也提供了使用示例。
## 数据集结构
在`dataset_grounded`目录下,您将找到原始数据集样本对应的关键表达式及其相关分割掩码。
## 数据集构建
我们的 [代码仓库](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/pipeline/augment_dataset) 详细说明了如何复现与重构该数据集,同时还提供了 [评测细节](https://github.com/IntelLabs/multimodal_cognitive_ai/FiVL/tree/main/pipeline/method_evaluation) 相关说明。
## 评测说明
我们采用人工标注与大语言模型作为评判者的自动标注方法对该数据集进行了评测。相关评测结果可参见论文第4章节。
## 伦理考量
英特尔(Intel)致力于尊重人权,避免使用或参与会对人权造成负面影响的应用场景。详见英特尔《全球人权原则》。英特尔的产品与软件仅可用于不会造成或加剧人权负面影响的应用场景。
## 联系方式
**问题反馈**:若您对该数据集有任何疑问或问题,请联系维护者或在数据集仓库中提交Issue。
提供机构:
maas
创建时间:
2025-08-01



