MyVLM
收藏魔搭社区2025-11-07 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/lhn526/MyVLM
下载链接
链接失效反馈官方服务:
资源简介:
# MyVLM
**Paper:** https://arxiv.org/abs/2403.14599
**Project Page:** https://snap-research.github.io/MyVLM/
**Code:** https://github.com/snap-research/MyVLM
# MyVLM Objects Dataset
<p align="center">
<img src="docs/myvlm-data.png" width="600px"/>
Example images for each object in our constructed dataset.
</p>
As part of our MyVLM code release, we have also released our object dataset introduced in the paper.
This contains 29 user-specific objects, each containing ~10 images and 5 corresponding personalized captions for each image.
Your data should be organized using the following structure:
```
data_root
├── <concept_name>
│ ├── <image1>.jpg
│ ├── <image2>.jpg
│ ├── ...
│ ├── captions.json (or captions_augmented.json)
│ └── additional_llava_vqa_data.json (optional, used for personalized VQA using LLaVA, see next section).
└── <concept_name_2>
```
That is, the root directory should contain a sub-directory for each concept. Then, in each concept directory, you should have:
1. the set of images we want to use either for training or inference.
2. a `json` file containing the captions for each image, named `captions.json` or `captions_augmented.json`.
This file should be in the following format:
```
{
"<image1>.jpg": ["<caption1>", "<caption2>", ...],
"<image2>.jpg": ["<caption1>", "<caption2>", ...],
...
}
```
That is, we have a dictionary mapping each image path to a list of target captions.
As described in the paper, at each optimization step we will randomly sample a caption from this list to use as the target caption for the image.
## License
This sample code is made available by Snap Inc. for non-commercial, academic purposes only.
Please see the full license [here](https://github.com/snap-research/MyVLM/blob/master/LICENSE).
# MyVLM
**论文地址:** https://arxiv.org/abs/2403.14599
**项目主页:** https://snap-research.github.io/MyVLM/
**代码仓库:** https://github.com/snap-research/MyVLM
# MyVLM 物体数据集
<p align="center">
<img src="docs/myvlm-data.png" width="600px"/>
本数据集构建的每一类物体的示例图像。
</p>
作为MyVLM代码发布的配套内容,我们同步公开了论文中提出的物体数据集。该数据集包含29个用户专属物体,每类物体配有约10张图像,且每张图像对应5条个性化图像描述文本(caption)。
数据集需遵循以下目录结构进行组织:
data_root
├── <概念名称>
│ ├── <image1>.jpg
│ ├── <image2>.jpg
│ ├── ...
│ ├── captions.json(或captions_augmented.json)
│ └── additional_llava_vqa_data.json(可选,用于基于LLaVA的个性化视觉问答(Visual Question Answering, VQA),详见下一节)
└── <概念名称2>
换言之,数据根目录下需为每个概念创建一个独立子目录。随后,每个概念子目录中需包含以下内容:
1. 用于模型训练或推理的图像集合
2. 存储每张图像对应描述文本的JSON文件,命名为`captions.json`或`captions_augmented.json`。该文件需遵循以下格式:
{
"<图像1>.jpg": ["<描述文本1>", "<描述文本2>", ...],
"<图像2>.jpg": ["<描述文本1>", "<描述文本2>", ...],
...
}
换言之,该文件为一个字典结构,以图像路径作为键,以目标描述文本列表作为值。正如论文所述,在每一轮优化步骤中,我们将从该列表中随机采样一条描述文本作为该图像的目标描述文本。
## 许可证
本示例代码由Snap公司发布,仅可用于非商业性学术研究用途。完整许可证条款请参阅[此处](https://github.com/snap-research/MyVLM/blob/master/LICENSE)
提供机构:
maas
创建时间:
2025-08-26



