five

Mind2Web_train_llava

收藏
魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/neulab/Mind2Web_train_llava
下载链接
链接失效反馈
官方服务:
资源简介:
#### Mind2Web training set for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824) 🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) ## Introduction We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video> ## Contact * Junpeng Liu: jpliu@link.cuhk.edu.hk * Xiang Yue: xyue2@andrew.cmu.edu ## Citation If you find this work helpful, please cite out paper: ```` @misc{liu2024harnessingwebpageuistextrich, title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, year={2024}, eprint={2410.13824}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.13824}, } ````

#### 用于论文《面向富文本视觉理解的网页用户界面利用》(Harnessing Webpage UIs for Text-Rich Visual Understanding)的Mind2Web训练集 🌐 [主页](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) ## 引言 我们提出**MultiUI**数据集,该数据集包含来自100万个网站的730万个样本,涵盖多样化的多模态任务与用户界面(User Interface, UI)布局。在**MultiUI**上训练的模型不仅在网页UI任务中表现优异——在VisualWebBench基准测试中性能最高提升48%,在AI智能体(AI Agent)网页数据集Mind2Web上的动作准确率提升19.1%——还能出色泛化至非网页UI任务,乃至非UI领域,例如文档理解、光学字符识别(Optical Character Recognition, OCR)与图表解读。 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video> ## 联系方式 * 刘俊鹏:jpliu@link.cuhk.edu.hk * 岳翔:xyue2@andrew.cmu.edu ## 引用 若您认为本工作对您有所帮助,请引用本文: ` @misc{liu2024harnessingwebpageuistextrich, title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, year={2024}, eprint={2410.13824}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.13824}, } `
提供机构:
maas
创建时间:
2025-10-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作