Mind2Web_train_llava
收藏魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/neulab/Mind2Web_train_llava
下载链接
链接失效反馈官方服务:
资源简介:
#### Mind2Web training set for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824)
🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824)
## Introduction
We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation.
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video>
## Contact
* Junpeng Liu: jpliu@link.cuhk.edu.hk
* Xiang Yue: xyue2@andrew.cmu.edu
## Citation
If you find this work helpful, please cite out paper:
````
@misc{liu2024harnessingwebpageuistextrich,
title={Harnessing Webpage UIs for Text-Rich Visual Understanding},
author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue},
year={2024},
eprint={2410.13824},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.13824},
}
````
#### 用于论文《面向富文本视觉理解的网页用户界面利用》(Harnessing Webpage UIs for Text-Rich Visual Understanding)的Mind2Web训练集
🌐 [主页](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824)
## 引言
我们提出**MultiUI**数据集,该数据集包含来自100万个网站的730万个样本,涵盖多样化的多模态任务与用户界面(User Interface, UI)布局。在**MultiUI**上训练的模型不仅在网页UI任务中表现优异——在VisualWebBench基准测试中性能最高提升48%,在AI智能体(AI Agent)网页数据集Mind2Web上的动作准确率提升19.1%——还能出色泛化至非网页UI任务,乃至非UI领域,例如文档理解、光学字符识别(Optical Character Recognition, OCR)与图表解读。
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video>
## 联系方式
* 刘俊鹏:jpliu@link.cuhk.edu.hk
* 岳翔:xyue2@andrew.cmu.edu
## 引用
若您认为本工作对您有所帮助,请引用本文:
`
@misc{liu2024harnessingwebpageuistextrich,
title={Harnessing Webpage UIs for Text-Rich Visual Understanding},
author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue},
year={2024},
eprint={2410.13824},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.13824},
}
`
提供机构:
maas
创建时间:
2025-10-10



