OmniInstruct_v1

Name: OmniInstruct_v1
Creator: maas
Published: 2026-01-02 16:19:24
License: 暂无描述

魔搭社区2026-01-02 更新2025-02-01 收录

下载链接：

https://modelscope.cn/datasets/m-a-p/OmniInstruct_v1

下载链接

链接失效反馈

官方服务：

资源简介：

# OmniBench [**🌐 Homepage**](https://m-a-p.ai/OmniBench/) | [**🏆 Leaderboard**](https://m-a-p.ai/OmniBench/#leaderboard) | [**📖 Arxiv Paper**](https://arxiv.org/abs/2409.15272) | [**🤗 Paper**](https://huggingface.co/papers/2409.15272) | [**🤗 OmniBench Dataset**](https://huggingface.co/datasets/m-a-p/OmniBench) | | [**🤗 OmniInstruct_V1 Dataset**](https://huggingface.co/datasets/m-a-p/OmniInstruct_v1/) | [**🦜 Tweets**](https://x.com/yizhilll/status/1838942877142962502) The project introduces **OmniBench**, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across **visual**, **acoustic**, and **textual** inputs simultaneously. We define models capable of such tri-modal processing as omni-language models (OLMs). ## Mini Leaderboard This table shows the omni-language models in the full evaluation setting in OmniBench, with the "Image & Audio", "Audio", and "Image" as input contexts and accuracy as metric. More results could be found at the [live leaderboard](https://m-a-p.ai/OmniBench/#leaderboard). | **Input Context** | **Image & Audio** | **Audio** | **Image** | |---------------------|----------------------|---------------------|---------------------| | MIO-SFT (13B) | 11.12% | 11.82% | 13.57% | | AnyGPT (7B) | 2.71% | 2.36% | 1.23% | | video-SALMONN (13B) | 11.30% | 11.56% | 11.38% | | UnifiedIO2-large (1.1B) | 22.68% | 24.69% | 24.52% | | UnifiedIO2-xlarge (3.2B) | 20.40% | 24.78% | 24.34% | | UnifiedIO2-xxlarge (6.8B) | 23.29% | 27.06% | 25.04% | | Gemini-1.5-Pro | 47.56% | 38.53% | 34.68% | | Reka-core-20240501 | 36.10% | 35.07% | 34.39% | ## Dataset The dataset consists of the following keys: - `"index"`: an integer suggests the question id. - `"task type"`: a string suggests one of the 7 task types. - `"audio type"`: a string suggests one of the 3 audio types (speech, sound event and music). - `"question"`: a string suggests the question. - `"options"`: a list of four strings for multi-choice questions. - `"answer"`: a string suggesting the correct response, must appear in `"options"`. - `"audio_path"`: the basename of the audio file, need to prepend `mm_data/audio` before using. - `"image_path"`: the basename of the image file, need to prepend `mm_data/image` before using. - `"audio"` (for HF version only): contains the numpy array for the wavfile. - `"image"` (for HF version only): contains the `PIL.Image()` object for the image. - `"audio content"`: the human-annotated audio transcripts, used in text alternative experiments. - `"image content"`: the VLM-generated caption for the image, used in text alternative experiments. ### Download from Huggingface ```python from datasets import load_dataset dataset = load_dataset("m-a-p/OmniBench") # check on the data samples print(dataset) print(dataset['train'][0]) # same for dataset = load_dataset("m-a-p/OmniInstruct_v1") ``` ## Reference ```bib @misc{li2024omnibench, title={OmniBench: Towards The Future of Universal Omni-Language Models}, author={Yizhi Li and Ge Zhang and Yinghao Ma and Ruibin Yuan and Kang Zhu and Hangyu Guo and Yiming Liang and Jiaheng Liu and Jian Yang and Siwei Wu and Xingwei Qu and Jinjie Shi and Xinyue Zhang and Zhenzhu Yang and Xiangzhou Wang and Zhaoxiang Zhang and Zachary Liu and Emmanouil Benetos and Wenhao Huang and Chenghua Lin}, year={2024}, eprint={2409.15272}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.15272}, } ```

# OmniBench [**🌐 主页**](https://m-a-p.ai/OmniBench/) | [**🏆 排行榜**](https://m-a-p.ai/OmniBench/#leaderboard) | [**📖 Arxiv论文**](https://arxiv.org/abs/2409.15272) | [**🤗 论文页面**](https://huggingface.co/papers/2409.15272) | [**🤗 OmniBench数据集**](https://huggingface.co/datasets/m-a-p/OmniBench) | | [**🤗 OmniInstruct_V1数据集**](https://huggingface.co/datasets/m-a-p/OmniInstruct_v1/) | [**🦜 推文**](https://x.com/yizhilll/status/1838942877142962502) 本项目推出**OmniBench**——一款全新的基准测试集，旨在严格评估模型同时识别、解读并推理**视觉（visual）**、**听觉（acoustic）**与**文本（textual）**多模态输入的能力。我们将具备这种三模态处理能力的模型定义为**全能语言模型（omni-language model, OLM）**。 ## 迷你排行榜本表格展示了在OmniBench全量评估设置下的各全能语言模型，以「图像与音频」「音频」「图像」作为输入上下文，以准确率作为评估指标。更多结果可查阅[实时排行榜](https://m-a-p.ai/OmniBench/#leaderboard)。 | **输入上下文** | **图像与音频** | **音频** | **图像** | |---------------------|----------------------|---------------------|---------------------| | MIO-SFT (13B) | 11.12% | 11.82% | 13.57% | | AnyGPT (7B) | 2.71% | 2.36% | 1.23% | | video-SALMONN (13B) | 11.30% | 11.56% | 11.38% | | UnifiedIO2-large (1.1B) | 22.68% | 24.69% | 24.52% | | UnifiedIO2-xlarge (3.2B) | 20.40% | 24.78% | 24.34% | | UnifiedIO2-xxlarge (6.8B) | 23.29% | 27.06% | 25.04% | | Gemini-1.5-Pro | 47.56% | 38.53% | 34.68% | | Reka-core-20240501 | 36.10% | 35.07% | 34.39% | ## 数据集该数据集包含以下字段： - `"index"`: 整数类型，表示问题ID。 - `"task type"`: 字符串类型，表示7种任务类型之一。 - `"audio type"`: 字符串类型，表示3种音频类型之一（语音、声音事件与音乐）。 - `"question"`: 字符串类型，表示问题内容。 - `"options"`: 包含4个字符串的列表，对应多项选择题的选项。 - `"answer"`: 字符串类型，表示正确答案，且该答案必然存在于`"options"`列表中。 - `"audio_path"`: 音频文件的基础文件名，使用前需拼接前缀`mm_data/audio`。 - `"image_path"`: 图像文件的基础文件名，使用前需拼接前缀`mm_data/image`。 - `"audio"`（仅Hugging Face版本包含）: 对应WAV文件的NumPy数组。 - `"image"`（仅Hugging Face版本包含）: 对应图像的`PIL.Image()`对象。 - `"audio content"`: 人工标注的音频转录文本，用于文本替代实验。 - `"image content"`: 视觉语言模型（Vision-Language Model, VLM）生成的图像描述文本，用于文本替代实验。 ### 从Hugging Face下载 python from datasets import load_dataset dataset = load_dataset("m-a-p/OmniBench") # 查看数据样本 print(dataset) print(dataset['train'][0]) # 以下代码加载OmniInstruct_v1数据集效果相同 dataset = load_dataset("m-a-p/OmniInstruct_v1") ## 参考文献 bib @misc{li2024omnibench, title={OmniBench: Towards The Future of Universal Omni-Language Models}, author={Yizhi Li and Ge Zhang and Yinghao Ma and Ruibin Yuan and Kang Zhu and Hangyu Guo and Yiming Liang and Jiaheng Liu and Jian Yang and Siwei Wu and Xingwei Qu and Jinjie Shi and Xinyue Zhang and Zhenzhu Yang and Xiangzhou Wang and Zhaoxiang Zhang and Zachary Liu and Emmanouil Benetos and Wenhao Huang and Chenghua Lin}, year={2024}, eprint={2409.15272}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2409.15272}, }

提供机构：

maas

创建时间：

2024-12-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集