IsoBench
收藏arXiv2024-04-02 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2404.01266v2
下载链接
链接失效反馈官方服务:
资源简介:
IsoBench是一个包含1630个样本的基准数据集,由南加州大学和杜克大学创建。该数据集涵盖数学、科学、算法和游戏四大领域,每个样本提供多种同构输入表示,如视觉、文本和数学表示。IsoBench旨在评估多模态基础模型在处理不同输入模态时的性能差异,特别关注文本表示的偏好。数据集通过精细反馈诊断性能差距,帮助理解模型能力如何随输入模态变化,并引入了IsoCombination和IsoScratchPad两种技术以提高模型性能。
IsoBench is a benchmark dataset consisting of 1,630 samples, developed by the University of Southern California and Duke University. This dataset covers four core domains: mathematics, science, algorithms, and games. Each sample provides multiple isomorphic input representations, such as visual, textual, and mathematical formats. IsoBench aims to evaluate the performance differences of multimodal foundation models when handling different input modalities, with a particular focus on the model's preference for textual representations. The dataset diagnoses performance gaps through fine-grained feedback, helping to understand how model capabilities vary with input modalities, and introduces two techniques, IsoCombination and IsoScratchPad, to enhance model performance.
提供机构:
南加州大学
创建时间:
2024-04-02



