EvalAnything-InstructionFollowing

Name: EvalAnything-InstructionFollowing
Creator: maas
Published: 2025-11-02 16:22:03
License: 暂无描述

魔搭社区2025-11-02 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/EvalAnything-InstructionFollowing

下载链接

链接失效反馈

官方服务：

资源简介：

# All-modality Generation (Instruction Following Part) <span style="color: red;">All-Modality Generation benchmark evaluates a model's ability to follow instructions, automatically select appropriate modalities, and create synergistic outputs across different modalities (text, visual, audio) while avoiding redundancy.</span> [🏠 Homepage](https://github.com/PKU-Alignment/align-anything) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/align-anything) [🤗 All-Modality Understanding Benchmark](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-AMU) [🤗 All-Modality Generation Benchmark (Instruction Following Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-InstructionFollowing) [🤗 All-Modality Generation Benchmark (Modality Selection and Synergy Part)](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-Selection_Synergy) [🤗 All-Modality Generation Reward Model](https://huggingface.co/PKU-Alignment/AnyRewardModel) ## Data Example <div align="center"> <img src="example-amg.png" width="100%"/> </div> ## Load dataset Loading Evaluation Datasets The default loading method is: ```python dataset = load_dataset( "PKU-Alignment/EvalAnything-InstructionFollowing", trust_remote_code=True ) ``` To load test data for a single modality (e.g., images), use: ```python dataset = load_dataset( "PKU-Alignment/EvalAnything-InstructionFollowing", name="image_instruct", trust_remote_code=True ) ``` ## Model Evaluation The AMG Instruction-Following evaluation covers Text/Image/Audio/Video Generation across four dimensions. The evaluation code is located in the eval_anything/amg/if folder. To start the evaluation (e.g., images): 1. Use [eval_anything/amg/if/image_instruct/example.py](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything/amg/if/image_instruct/example.py) to generate multi-modal results 2. Use [eval_anything/amg/if/image_instruct/eval_instruct.py](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything/amg/if/image_instruct/eval_instruct.py) to evaluate the generated results 3. Since there isn't a widely accepted video or audio large language model in the community to serve as a judge model for direct scoring, we use multiple-choice questions to check whether the generated modalities contain the instructed content for video and audio evaluations. For text and image modalities, we continue using GPT as a judge model to provide direct scores. **Note:** The current code is a sample script for the All-Modality Generation subtask of Eval Anything. In the future, we will integrate Eval Anything's evaluation into the framework to provide convenience for community use. ## Citation Please cite our work if you use our benchmark or model in your paper. ```bibtex @inproceedings{ji2024align, title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback}, author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang}, year={2024}, url={https://arxiv.org/abs/2412.15838} } ```

# 全模态生成（指令遵循部分） <span style="color: red;">全模态生成基准（All-Modality Generation benchmark）可评估模型遵循指令、自动选择适配模态，并在文本、视觉、音频等多模态间生成协同输出且规避冗余内容的能力。</span> [🏠 主页](https://github.com/PKU-Alignment/align-anything) | [👍 官方代码仓库](https://github.com/PKU-Alignment/align-anything) [🤗 全模态理解基准](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-AMU) [🤗 全模态生成基准（指令遵循部分）](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-InstructionFollowing) [🤗 全模态生成基准（模态选择与协同部分）](https://huggingface.co/datasets/PKU-Alignment/EvalAnything-Selection_Synergy) [🤗 全模态生成奖励模型](https://huggingface.co/PKU-Alignment/AnyRewardModel) ## 数据示例 <div align="center"> <img src="example-amg.png" width="100%"/> </div> ## 数据集加载 ### 加载评估数据集默认加载方式如下： python dataset = load_dataset( "PKU-Alignment/EvalAnything-InstructionFollowing", trust_remote_code=True ) 若需加载单一模态（如图像）的测试数据，可使用如下方式： python dataset = load_dataset( "PKU-Alignment/EvalAnything-InstructionFollowing", name="image_instruct", trust_remote_code=True ) ## 模型评估全模态生成（AMG）的指令遵循评估涵盖文本、图像、音频、视频生成四大维度。评估代码位于eval_anything/amg/if文件夹下。以图像评估为例，启动评估步骤如下： 1. 使用[eval_anything/amg/if/image_instruct/example.py](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything/amg/if/image_instruct/example.py)生成多模态结果 2. 使用[eval_anything/amg/if/image_instruct/eval_instruct.py](https://github.com/PKU-Alignment/align-anything/tree/main/align_anything/evaluation/eval_anything/amg/if/image_instruct/eval_instruct.py)对生成结果进行评估 3. 由于当前社区尚未有广泛认可的视频或音频大语言模型可作为评判模型以直接打分，因此针对视频与音频的评估，我们采用选择题形式验证生成模态是否包含指令要求的内容；针对文本与图像模态，我们仍使用GPT作为评判模型以提供直接打分。 **注意：** 当前代码为Eval Anything全模态生成子任务的示例脚本。未来我们将把Eval Anything的评估功能集成至该框架中，以方便社区用户使用。 ## 引用若您在论文中使用本基准或模型，请引用我们的工作： bibtex @inproceedings{ji2024align, title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback}, author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang}, year={2024}, url={https://arxiv.org/abs/2412.15838} }

提供机构：

maas

创建时间：

2025-02-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集