five

mmE5-synthetic

收藏
魔搭社区2025-05-22 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/mmE5-synthetic
下载链接
链接失效反馈
官方服务:
资源简介:
# mmE5 Synthetic Data This dataset contains synthetic datasets used for the finetuning of mmE5 ([mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data](https://arxiv.org/abs/2502.08468)): - **Classification** - **Retrieval** - **VQA** [Github](https://github.com/haon-chen/mmE5) ## Image Preparation First, you should prepare the images used for training: ### Image Downloads - **Download Links**: Download image resources for dataset via the following link: - [**LAOIN-images**](https://huggingface.co/datasets/Haon-Chen/mmE5-synthetic/blob/main/LAION_Synthetic.tar.gz) ### Image Organization ``` images/ ├── laion/ └── ... .jpg ``` You can refer to the image paths in each subset to view the image organization. You can also customize your image paths by altering the image_path fields. ## Citation If you use this dataset in your research, please cite the associated paper. ``` @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} } ```

# mmE5合成数据集 本数据集包含用于mmE5微调的合成数据集,相关论文为《mmE5:基于高质量合成数据优化多模态多语言嵌入》(https://arxiv.org/abs/2502.08468): - **分类任务** - **检索任务** - **视觉问答(Visual Question Answering,VQA)** [GitHub](https://github.com/haon-chen/mmE5) ## 图像准备 首先,请准备训练所需的图像数据: ### 图像下载 - **下载链接**:请通过以下链接获取数据集的图像资源: - [**LAOIN图像集**](https://huggingface.co/datasets/Haon-Chen/mmE5-synthetic/blob/main/LAION_Synthetic.tar.gz) ### 图像组织 images/ ├── laion/ └── ... .jpg 您可参考各子集内的图像路径,了解数据集的图像组织方式。您也可以通过修改`image_path`字段来自定义图像路径。 ## 引用 若您在研究工作中使用本数据集,请引用相关论文。 @article{chen2025mmE5, title={mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data}, author={Chen, Haonan and Wang, Liang and Yang, Nan and Zhu, Yutao and Zhao, Ziliang and Wei, Furu and Dou, Zhicheng}, journal={arXiv preprint arXiv:2502.08468}, year={2025} }
提供机构:
maas
创建时间:
2025-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作