five

llava_zh_en_2k_hf

收藏
魔搭社区2025-11-25 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/Tina12345/llava_zh_en_2k_hf
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is composed by * 1k examples of English Visual Instruction Data from [LLaVA](https://github.com/haotian-liu/LLaVA). * 1k examples of English Visual Instruction Data from [openbmb](https://huggingface.co/datasets/openbmb/llava_zh). You can organize content in the dataset_info.json in LLaMA Factory like this: ``` "llava_1k_en": { "hf_hub_url": "BUAADreamer/llava-en-zh-2k", "subset": "en", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, "llava_1k_zh": { "hf_hub_url": "BUAADreamer/llava-en-zh-2k", "subset": "zh", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, ``` You can use it in [LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) by specifying --dataset llava_1k_en,llava_1k_zh.

本数据集由以下两部分构成: * 源自[LLaVA](https://github.com/haotian-liu/LLaVA)的1000条英文视觉指令数据样本; * 源自[openbmb](https://huggingface.co/datasets/openbmb/llava_zh)的1000条英文视觉指令数据样本。 你可在LLaMA Factory的`dataset_info.json`配置文件中按如下格式组织数据集内容: "llava_1k_en": { "hf_hub_url": "BUAADreamer/llava-en-zh-2k", "subset": "en", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, "llava_1k_zh": { "hf_hub_url": "BUAADreamer/llava-en-zh-2k", "subset": "zh", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } }, 你可通过指定`--dataset llava_1k_en,llava_1k_zh`参数,在[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory)中使用该数据集。
提供机构:
maas
创建时间:
2025-04-17
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含2000个视觉指令数据示例,其中1000个来自LLaVA,1000个来自openbmb,均为英文内容。它遵循特定格式,适用于LLaMA Factory工具进行数据处理和模型训练。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作