PixelWorld

Name: PixelWorld
Creator: maas
Published: 2025-12-05 16:22:18
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-08 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/PixelWorld

下载链接

链接失效反馈

官方服务：

资源简介：

# PixelWorld [📜 Paper](https://arxiv.org/abs/2501.19339) | [💾 GitHub](https://github.com/TIGER-AI-Lab/PixelWorld) | [📂 HuggingFace Dataset](https://huggingface.co/datasets/TIGER-Lab/PixelWorld) **PixelWorld** is a multimodal benchmark that unifies text, tables, code, diagrams, and images into **pixel-based inputs** (PEAP: *Perceive Everything as Pixels*). It enables direct comparison between token-based and pixel-based processing. ### 🔹 Features - 📚 **Broad Coverage**: Text-only (GLUE, SuperGLUE, MMLU-Pro), structured (TableBench), and multimodal tasks (SlidesVQA, WikiSS-QA, MathVerse). - 🖼️ **Unified Input**: Converts text and tables into images while preserving native visual formats for multimodal data. - ⚖️ **Parallel Evaluation**: Both text and pixel versions allow direct performance comparison. 🚀 **PixelWorld** helps assess models’ ability to process text as visual input and benchmark their multimodal generalization. <p align="center"> <img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld Composition Overview" width="75%"/> </p> ## 📊 Data Format TO be updated ## 🚀 Usage ### 1. Direct Loading from Hugging Face ```python import datasets dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train") print(dataset) ``` ### 2. Use through Github Codebase ```python python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf ``` ## 📌 Citation ```bibtex @article{lyu2024pixelworld, title={PixelWorld: Towards Perceiving Everything as Pixels}, author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu}, year={2025}, eprint={2501.19339}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={http://arxiv.org/abs/2501.19339}, } ``` ## ❓ Q&A For questions, open an issue or email: 📧 zhiheng.lyu@uwaterloo.ca 📧 wenhuchen@uwaterloo.ca

# PixelWorld [📜 论文](https://arxiv.org/abs/2501.19339) | [💾 GitHub 仓库](https://github.com/TIGER-AI-Lab/PixelWorld) | [📂 HuggingFace 数据集](https://huggingface.co/datasets/TIGER-Lab/PixelWorld) **PixelWorld** 是一款多模态基准测试集，它将文本、表格、代码、图表与图像统一为**基于像素的输入**（PEAP：Perceive Everything as Pixels，即“将万物感知为像素”）。该基准集可实现基于Token的处理与基于像素的处理之间的直接对比。 ### 🔹 核心特性 - 📚 **覆盖范围广泛**：涵盖纯文本任务（GLUE、SuperGLUE、MMLU-Pro）、结构化任务（TableBench）以及多模态任务（SlidesVQA、WikiSS-QA、MathVerse）。 - 🖼️ **统一输入格式**：将文本与表格转换为图像格式，同时保留多模态数据的原生视觉结构。 - ⚖️ **并行评估能力**：支持文本与像素两种输入版本的并行测试，可直接对比模型性能表现。 🚀 **PixelWorld** 可用于评估模型将文本作为视觉输入进行处理的能力，并对其多模态泛化性能开展基准测试。 <p align="center"> <img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld 构成概览" width="75%"/> </p> ## 📊 数据格式待更新 ## 🚀 使用方法 ### 1. 从Hugging Face直接加载 python import datasets dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train") print(dataset) ### 2. 通过GitHub代码库使用 python python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf ## 📌 引用格式 bibtex @article{lyu2024pixelworld, title={PixelWorld: Towards Perceiving Everything as Pixels}, author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu}, year={2025}, eprint={2501.19339}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={http://arxiv.org/abs/2501.19339}, } ## ❓ 常见问题如有疑问，请提交Issue或发送邮件至： 📧 zhiheng.lyu@uwaterloo.ca 📧 wenhuchen@uwaterloo.ca

提供机构：

maas

创建时间：

2025-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集