AntGroup-MI/Osprey-724K
收藏Hugging Face2024-02-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/AntGroup-MI/Osprey-724K
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- conversational
- text-generation
- summarization
- question-answering
language:
- en
---
### Osprey-724K Dataset Card
Osprey-724K is an instruction dataset with mask-text pairs, containing around 724K GPT-generated multimodal dialogues to encourage MLLMs for fine-grained pixel-level image understanding. It contains object-level, part-level and additional instruction samples for robustness and flexibility.
#### Dataset type:
- Object-level: [osprey_conversation.json](https://huggingface.co/datasets/AntGroup-MI/Osprey-724K/resolve/main/osprey_conversation.json?download=true), [osprey_detail_description.json](https://huggingface.co/datasets/AntGroup-MI/Osprey-724K/resolve/main/osprey_detail_description.json?download=true)
- Part-level: [osprey_part_level.json](https://huggingface.co/datasets/AntGroup-MI/Osprey-724K/resolve/main/osprey_part_level.json?download=true)
- Robustness&Flexibility: [osprey_lvis_positive_negative.json](https://huggingface.co/datasets/AntGroup-MI/Osprey-724K/resolve/main/osprey_lvis_positive_negative.json?download=true), [osprey_short_form.json](https://huggingface.co/datasets/AntGroup-MI/Osprey-724K/resolve/main/osprey_short_form.json?download=true)
### Paper and Code
Paper: [https://arxiv.org/abs/2312.10032](https://arxiv.org/abs/2312.10032) \
Code: [https://github.com/CircleRadon/Osprey](https://github.com/CircleRadon/Osprey)
### License
Attribution-NonCommercial 4.0 International \
It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use.
### Citations
```
@misc{Osprey,
title={Osprey: Pixel Understanding with Visual Instruction Tuning},
author={Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang and Jianke Zhu},
year={2023},
eprint={2312.10032},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
提供机构:
AntGroup-MI
原始信息汇总
数据集概述
数据集名称
Osprey-724K
数据集内容
- 类型:指令数据集
- 特点:包含掩码文本对
- 规模:约724,000个GPT生成的多模态对话
数据集目的
- 促进多模态大型语言模型(MLLMs)对像素级图像理解的细粒度能力
数据集结构
- 包含对象级、部分级及附加指令样本,以增强鲁棒性和灵活性



