PixelWorld
收藏魔搭社区2025-12-05 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/PixelWorld
下载链接
链接失效反馈官方服务:
资源简介:
# PixelWorld
[📜 Paper](https://arxiv.org/abs/2501.19339) |
[💾 GitHub](https://github.com/TIGER-AI-Lab/PixelWorld) |
[📂 HuggingFace Dataset](https://huggingface.co/datasets/TIGER-Lab/PixelWorld)
**PixelWorld** is a multimodal benchmark that unifies text, tables, code, diagrams, and images into **pixel-based inputs** (PEAP: *Perceive Everything as Pixels*). It enables direct comparison between token-based and pixel-based processing.
### 🔹 Features
- 📚 **Broad Coverage**: Text-only (GLUE, SuperGLUE, MMLU-Pro), structured (TableBench), and multimodal tasks (SlidesVQA, WikiSS-QA, MathVerse).
- 🖼️ **Unified Input**: Converts text and tables into images while preserving native visual formats for multimodal data.
- ⚖️ **Parallel Evaluation**: Both text and pixel versions allow direct performance comparison.
🚀 **PixelWorld** helps assess models’ ability to process text as visual input and benchmark their multimodal generalization.
<p align="center">
<img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld Composition Overview" width="75%"/>
</p>
## 📊 Data Format
TO be updated
## 🚀 Usage
### 1. Direct Loading from Hugging Face
```python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train")
print(dataset)
```
### 2. Use through Github Codebase
```python
python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf
```
## 📌 Citation
```bibtex
@article{lyu2024pixelworld,
title={PixelWorld: Towards Perceiving Everything as Pixels},
author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu},
year={2025},
eprint={2501.19339},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={http://arxiv.org/abs/2501.19339},
}
```
## ❓ Q&A
For questions, open an issue or email:
📧 zhiheng.lyu@uwaterloo.ca
📧 wenhuchen@uwaterloo.ca
# PixelWorld
[📜 论文](https://arxiv.org/abs/2501.19339) | [💾 GitHub 仓库](https://github.com/TIGER-AI-Lab/PixelWorld) | [📂 HuggingFace 数据集](https://huggingface.co/datasets/TIGER-Lab/PixelWorld)
**PixelWorld** 是一款多模态基准测试集,它将文本、表格、代码、图表与图像统一为**基于像素的输入**(PEAP:Perceive Everything as Pixels,即“将万物感知为像素”)。该基准集可实现基于Token的处理与基于像素的处理之间的直接对比。
### 🔹 核心特性
- 📚 **覆盖范围广泛**:涵盖纯文本任务(GLUE、SuperGLUE、MMLU-Pro)、结构化任务(TableBench)以及多模态任务(SlidesVQA、WikiSS-QA、MathVerse)。
- 🖼️ **统一输入格式**:将文本与表格转换为图像格式,同时保留多模态数据的原生视觉结构。
- ⚖️ **并行评估能力**:支持文本与像素两种输入版本的并行测试,可直接对比模型性能表现。
🚀 **PixelWorld** 可用于评估模型将文本作为视觉输入进行处理的能力,并对其多模态泛化性能开展基准测试。
<p align="center">
<img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld 构成概览" width="75%"/>
</p>
## 📊 数据格式
待更新
## 🚀 使用方法
### 1. 从Hugging Face直接加载
python
import datasets
dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train")
print(dataset)
### 2. 通过GitHub代码库使用
python
python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf
## 📌 引用格式
bibtex
@article{lyu2024pixelworld,
title={PixelWorld: Towards Perceiving Everything as Pixels},
author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu},
year={2025},
eprint={2501.19339},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={http://arxiv.org/abs/2501.19339},
}
## ❓ 常见问题
如有疑问,请提交Issue或发送邮件至:
📧 zhiheng.lyu@uwaterloo.ca
📧 wenhuchen@uwaterloo.ca
提供机构:
maas
创建时间:
2025-02-04



