five

PixelWorld

收藏
魔搭社区2025-12-05 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/PixelWorld
下载链接
链接失效反馈
官方服务:
资源简介:
# PixelWorld [📜 Paper](https://arxiv.org/abs/2501.19339) | [💾 GitHub](https://github.com/TIGER-AI-Lab/PixelWorld) | [📂 HuggingFace Dataset](https://huggingface.co/datasets/TIGER-Lab/PixelWorld) **PixelWorld** is a multimodal benchmark that unifies text, tables, code, diagrams, and images into **pixel-based inputs** (PEAP: *Perceive Everything as Pixels*). It enables direct comparison between token-based and pixel-based processing. ### 🔹 Features - 📚 **Broad Coverage**: Text-only (GLUE, SuperGLUE, MMLU-Pro), structured (TableBench), and multimodal tasks (SlidesVQA, WikiSS-QA, MathVerse). - 🖼️ **Unified Input**: Converts text and tables into images while preserving native visual formats for multimodal data. - ⚖️ **Parallel Evaluation**: Both text and pixel versions allow direct performance comparison. 🚀 **PixelWorld** helps assess models’ ability to process text as visual input and benchmark their multimodal generalization. <p align="center"> <img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld Composition Overview" width="75%"/> </p> ## 📊 Data Format TO be updated ## 🚀 Usage ### 1. Direct Loading from Hugging Face ```python import datasets dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train") print(dataset) ``` ### 2. Use through Github Codebase ```python python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf ``` ## 📌 Citation ```bibtex @article{lyu2024pixelworld, title={PixelWorld: Towards Perceiving Everything as Pixels}, author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu}, year={2025}, eprint={2501.19339}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={http://arxiv.org/abs/2501.19339}, } ``` ## ❓ Q&A For questions, open an issue or email: 📧 zhiheng.lyu@uwaterloo.ca 📧 wenhuchen@uwaterloo.ca

# PixelWorld [📜 论文](https://arxiv.org/abs/2501.19339) | [💾 GitHub 仓库](https://github.com/TIGER-AI-Lab/PixelWorld) | [📂 HuggingFace 数据集](https://huggingface.co/datasets/TIGER-Lab/PixelWorld) **PixelWorld** 是一款多模态基准测试集,它将文本、表格、代码、图表与图像统一为**基于像素的输入**(PEAP:Perceive Everything as Pixels,即“将万物感知为像素”)。该基准集可实现基于Token的处理与基于像素的处理之间的直接对比。 ### 🔹 核心特性 - 📚 **覆盖范围广泛**:涵盖纯文本任务(GLUE、SuperGLUE、MMLU-Pro)、结构化任务(TableBench)以及多模态任务(SlidesVQA、WikiSS-QA、MathVerse)。 - 🖼️ **统一输入格式**:将文本与表格转换为图像格式,同时保留多模态数据的原生视觉结构。 - ⚖️ **并行评估能力**:支持文本与像素两种输入版本的并行测试,可直接对比模型性能表现。 🚀 **PixelWorld** 可用于评估模型将文本作为视觉输入进行处理的能力,并对其多模态泛化性能开展基准测试。 <p align="center"> <img src="https://tiger-ai-lab.github.io/PixelWorld/static/images/table1.jpg" alt="PixelWorld 构成概览" width="75%"/> </p> ## 📊 数据格式 待更新 ## 🚀 使用方法 ### 1. 从Hugging Face直接加载 python import datasets dataset = datasets.load_dataset("TIGER-Lab/PixelWorld", "text_only", split="train") print(dataset) ### 2. 通过GitHub代码库使用 python python data.py --dataset WikiSS_QADataset --model GPT4o --mode text --prompt base --from_hf ## 📌 引用格式 bibtex @article{lyu2024pixelworld, title={PixelWorld: Towards Perceiving Everything as Pixels}, author={Lyu, Zhiheng and Ma, Xueguang and Chen, Wenhu}, year={2025}, eprint={2501.19339}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={http://arxiv.org/abs/2501.19339}, } ## ❓ 常见问题 如有疑问,请提交Issue或发送邮件至: 📧 zhiheng.lyu@uwaterloo.ca 📧 wenhuchen@uwaterloo.ca
提供机构:
maas
创建时间:
2025-02-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作