ABC-VG-Instruct

Name: ABC-VG-Instruct
Creator: maas
Published: 2025-12-10 16:24:48
License: 暂无描述

魔搭社区2025-12-10 更新2025-03-01 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/ABC-VG-Instruct

下载链接

链接失效反馈

官方服务：

资源简介：

## VG Instruct This is the instruction finetuning dataset for *ABC: Achieving better control of multimodal embeddings using VLMs*. Each element in this dataset contains 4 instruction-captions pairs for images in the visual genome dataset, corresponding to different bounding boxes in the image. We use this dataset to train an embedding model that can use instruction to embeds specific aspects of a scene. ![My Image](https://huggingface.co/datasets/TIGER-Lab/ABC-VG-Instruct/resolve/main/example.png) Combined with our pretraining step, this results in a model that can create high quality embeddings from images containing multiple, potentially distracting elements. ## Paper, Website, and Code For more information, please refer to the [Paper](https://huggingface.co/papers/2503.00329), [Website](https://tiger-ai-lab.github.io/ABC/), and [Code](https://github.com/TIGER-AI-Lab/ABC). ## Sample Usage ### Loading the dataset You can load the text data and dataset metadata using HF's `load_dataset` utility: ```python from datasets import load_dataset dataset = load_dataset("TIGER-Lab/ABC-VG-Instruct") print(dataset) # DatasetDict({ # train: Dataset({ # features: ['0', '1', '2', '3'], # num_rows: 12500 # }) # }) print(dataset['train'][0]) # Example (output will vary): # { # '0': {'height': 200, 'id': 2379374, 'image': 123, 'instruction': 'the person on the left', 'phrase': 'man', 'width': 100, 'x': 50, 'y': 100}, # '1': {'height': 150, 'id': 2379375, 'image': 123, 'instruction': 'the woman in the middle', 'phrase': 'woman', 'width': 75, 'x': 180, 'y': 120}, # '2': {'height': 180, 'id': 2379376, 'image': 123, 'instruction': 'the building in the background', 'phrase': 'building', 'width': 300, 'x': 0, 'y': 0}, # '3': {'height': 50, 'id': 2379377, 'image': 123, 'instruction': 'the car on the right', 'phrase': 'car', 'width': 80, 'x': 350, 'y': 200} # } ``` ### Fetching Images To fetch the images from our datasets, we provide scripts in the `fetch_datasets` directory within the [Github repository](https://github.com/TIGER-AI-Lab/ABC). These scripts will pull the pretraining/finetuning image data off the hub and unpack them in your huggingface datasets cache (under a directory called `tigerlab`). Run `python ./fetch_datasets/instruct.py` to get the finetuning dataset's images. ### Quick Start with the Associated Model To quickly get started with making multimodal embeddings using the ABC model, follow these steps from the project's GitHub repository: 1. **Install Dependencies:** ```bash git clone https://github.com/TIGER-AI-Lab/ABC cd ABC pip install -r requirements.txt ``` 2. **Start making multimodal embeddings!** ```bash python -i ./quick_start.py ``` ## Citation ```bibtex @misc{schneider2025abcachievingbettercontrol, title={ABC: Achieving Better Control of Multimodal Embeddings using VLMs}, author={Benjamin Schneider and Florian Kerschbaum and Wenhu Chen}, year={2025}, eprint={2503.00329}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.00329}, } ```

# VG Instruct 数据集本数据集为论文《ABC：基于视觉语言模型（Vision-Language Models, VLMs）实现多模态嵌入的精细化控制》所配套的指令微调数据集。本数据集的每个样本均包含视觉基因组（Visual Genome, VG）数据集中单张图像的4组指令-描述对，分别对应图像内不同的边界框区域。我们借助该数据集训练可通过指令对场景的特定维度进行嵌入表示的嵌入模型。 ![示例图像](https://huggingface.co/datasets/TIGER-Lab/ABC-VG-Instruct/resolve/main/example.png) 结合预训练流程，该模型可从包含多个潜在干扰元素的图像中生成高质量的嵌入表示。 ## 论文、官网与代码仓库更多信息请参阅[论文](https://huggingface.co/papers/2503.00329)、[官网](https://tiger-ai-lab.github.io/ABC/)及[代码仓库](https://github.com/TIGER-AI-Lab/ABC)。 ## 样例使用方法 ### 数据集加载你可以通过Hugging Face（HF）的`load_dataset`工具加载文本数据与数据集元数据： python from datasets import load_dataset dataset = load_dataset("TIGER-Lab/ABC-VG-Instruct") print(dataset) # 数据集字典（输出格式示例）： # DatasetDict({ # train: Dataset({ # features: ['0', '1', '2', '3'], # num_rows: 12500 # }) # }) print(dataset['train'][0]) # 样本示例（输出内容将有所差异）： # { # '0': {'height': 200, 'id': 2379374, 'image': 123, 'instruction': 'the person on the left', 'phrase': 'man', 'width': 100, 'x': 50, 'y': 100}, # '1': {'height': 150, 'id': 2379375, 'image': 123, 'instruction': 'the woman in the middle', 'phrase': 'woman', 'width': 75, 'x': 180, 'y': 120}, # '2': {'height': 180, 'id': 2379376, 'image': 123, 'instruction': 'the building in the background', 'phrase': 'building', 'width': 300, 'x': 0, 'y': 0}, # '3': {'height': 50, 'id': 2379377, 'image': 123, 'instruction': 'the car on the right', 'phrase': 'car', 'width': 80, 'x': 350, 'y': 200} # } ### 获取图像数据若需获取数据集中的图像资源，我们在[GitHub代码仓库](https://github.com/TIGER-AI-Lab/ABC)的`fetch_datasets`目录中提供了配套脚本。这些脚本会从Hub拉取预训练与微调所需的图像数据，并将其解压至你的Hugging Face数据集缓存目录下的`tigerlab`子目录中。运行以下命令获取微调数据集的图像： bash python ./fetch_datasets/instruct.py ### 关联模型快速上手如需快速使用ABC模型生成多模态嵌入，请按照项目GitHub仓库中的步骤操作： 1. **安装依赖环境：** bash git clone https://github.com/TIGER-AI-Lab/ABC cd ABC pip install -r requirements.txt 2. **开始生成多模态嵌入！** bash python -i ./quick_start.py ## 引用格式 bibtex @misc{schneider2025abcachievingbettercontrol, title={ABC: Achieving Better Control of Multimodal Embeddings using VLMs}, author={Benjamin Schneider and Florian Kerschbaum and Wenhu Chen}, year={2025}, eprint={2503.00329}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.00329}, }

提供机构：

maas

创建时间：

2025-02-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集