CogVLM-SFT-311K

Name: CogVLM-SFT-311K
Creator: maas
Published: 2026-01-02 16:16:34
License: 暂无描述

魔搭社区2026-01-02 更新2024-06-22 收录

下载链接：

https://modelscope.cn/datasets/ZhipuAI/CogVLM-SFT-311K

下载链接

链接失效反馈

官方服务：

资源简介：

# CogVLM-SFT-311K: Bilingual Visual Instruction Data in CogVLM SFT CogVLM-SFT-311K is the primary aligned corpus used in the initial training of CogVLM v1.0. The process of constructing this dataset is as follows: 1. Approximately 3500 high-quality data samples were selected from the open source [MiniGPT-4](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align), known as minigpt4-3500. 2. Minigpt4-3500 was integrated with [Llava-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) and translated into Chinese through a language model. 3. We discovered significant noise in the detailed description part of minigpt4-3500 and Llava-instruct. Thus, we corrected these Chinese corpora and retranslated them into English. ## Dataset Information The dataset contains three folders corresponding to the mixed part of minigpt4-3500 and llava, the llava solo conversation, and the multi-turn conversation datasets. Their layout is as follows: ``` .CogVLM-SFT-311K ├── llava_details-minigpt4_3500_formate ├── llava_instruction_multi_conversations_formate └── llava_instruction_single_conversation_formate ``` In our open-source data, the datasets are distributed as follows: ``` .llava_details-minigpt4_3500_formate ├── images │ └── 00000001.jpg └── labels └── 00000001.json ``` Images are stored in the images folder, while labels containing the corresponding image description or dialogue are stored in the labels folder. ## Dataset Quantity + llava_details-minigpt4_3500_formate: 22,464 images and descriptions + llava_instruction_multi_conversations_formate: 56,673 images and multi-turn conversations + llava_instruction_single_conversation_formate: 76,634 images and single-turn conversations Dataset Format Caption format for image description ``` { "captions": [ { "role": "caption", "content": "The photograph features a beach scene with a group of people surfing in the ocean. There are ten individuals spaced out in the water, sitting or lying on various sizes of body boards. The surfers line up in a row, close to the sandy shoreline. Some of the body boards are visible floating on the surface of the water, with the surfers riding on them or nearby." } ] } ``` Conversation format for image dialogue ``` { "conversations": [ { "role": "user", "content": "What can be inferred about the zebras' behavior and surroundings?" }, { "role": "assistant", "content": "Based on the image, we can infer that the two zebras are likely seeking relief from the sun's heat, as they are standing side by side under the branches of a thorny tree. This shade-providing tree offers some respite from the sun, possibly during the hottest part of the day. The zebras are in a green field with grass, providing them with an ideal environment to graze and eat while staying near their source of shelter. This shows that the zebras' behavior is influenced by the conditions and available resources in their surroundings. It also highlights that these animals adopt strategies to adapt to the fluctuating conditions of their environment, such as cooperation and seeking shelter, to survive and thrive in their natural habitat." } ] } ``` ## License + Due to non-commercial agreements, we did not use these data in the bilingual version of CogVLM or any other models involving commercialization. + The dataset license adheres to: <br> Attribution-NonCommercial 4.0 International. It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use This will not allow you to use these data for any **commercial activitiesI**. ## References This project utilizes data and concepts based on the following research papers: - Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv preprint arXiv:2304.10592. - Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual Instruction Tuning. arXiv:2304.08485.

# CogVLM-SFT-311K：CogVLM监督微调阶段所用双语视觉指令数据集 CogVLM-SFT-311K是CogVLM v1.0初始训练所用的核心对齐语料库。该数据集的构建流程如下： 1. 从开源的[MiniGPT-4](https://huggingface.co/datasets/Vision-CAIR/cc_sbu_align)中选取约3500条高质量数据样本，记为minigpt4-3500。 2. 将minigpt4-3500与[Llava-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)进行融合，并通过语言模型将其翻译为中文。 3. 我们发现minigpt4-3500与Llava-Instruct的细节描述部分存在大量噪声，因此对这些中文语料进行了修正，并重新将其翻译回英文。 ## 数据集信息该数据集包含三个文件夹，分别对应minigpt4-3500与Llava的混合部分、Llava单轮对话数据集以及多轮对话数据集。其目录结构如下： .CogVLM-SFT-311K ├── llava_details-minigpt4_3500_formate ├── llava_instruction_multi_conversations_formate └── llava_instruction_single_conversation_formate 在我们开源的数据中，数据集的分布形式如下： .llava_details-minigpt4_3500_formate ├── images │ └── 00000001.jpg └── labels └── 00000001.json 图像存储于images文件夹中，而包含对应图像描述或对话的标签文件则存储于labels文件夹内。 ## 数据集规模 + llava_details-minigpt4_3500_formate：22,464张图像及对应描述 + llava_instruction_multi_conversations_formate：56,673张图像及多轮对话数据 + llava_instruction_single_conversation_formate：76,634张图像及单轮对话数据 ## 数据集格式 ### 图像描述标注格式 json { "captions": [ { "role": "caption", "content": "该照片展现了一处海滩场景，一群人在海洋中冲浪。水面上共有十名参与者，或坐或卧于不同尺寸的冲浪板上。冲浪者们排成一列，靠近沙质海岸线。部分冲浪板漂浮在水面，冲浪者或站于板上或紧邻其旁。" } ] } ### 图像对话格式 json { "conversations": [ { "role": "user", "content": "我们可以从这张斑马的图片中推断出它们的行为和周边环境有哪些特点？" }, { "role": "assistant", "content": "根据图片内容，我们可以推断这两只斑马正试图躲避烈日的暴晒，因为它们并肩站在一棵多刺树木的树荫下。这棵提供阴凉的树木为它们提供了避暑之处，可能是在一天中最热的时段。斑马身处长有青草的绿色田野中，这里既为它们提供了理想的觅食环境，又紧邻它们的庇护所。这表明斑马的行为会受到周边环境条件与可用资源的影响，同时也体现出这些动物会采取协作、寻找庇护所等策略来适应环境的动态变化，从而在自然栖息地中生存并繁衍。" } ] } ## 许可协议 + 由于非商业协议约束，我们未将该数据集用于CogVLM双语版本或任何涉及商业化的模型训练。 + 本数据集遵循**署名-非商业性使用4.0国际许可（Attribution-NonCommercial 4.0 International）**，同时需遵守OpenAI的使用政策：https://openai.com/policies/terms-of-use。本数据集不得用于任何**商业活动**。 ## 参考文献本项目所使用的数据与概念基于以下研究论文： - Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. arXiv preprint arXiv:2304.10592. - Liu, H., Li, C., Wu, Q., & Lee, Y. J. (2023). Visual Instruction Tuning. arXiv:2304.08485.

提供机构：

maas

创建时间：

2024-08-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集