Mind2Web_train_llava

Name: Mind2Web_train_llava
Creator: maas
Published: 2025-12-05 11:37:46
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-03 收录

下载链接：

https://modelscope.cn/datasets/neulab/Mind2Web_train_llava

下载链接

链接失效反馈

官方服务：

资源简介：

#### Mind2Web training set for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding](https://arxiv.org/abs/2410.13824) 🌐 [Homepage](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) ## Introduction We introduce **MultiUI**, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on **MultiUI** not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation. <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video> ## Contact * Junpeng Liu: jpliu@link.cuhk.edu.hk * Xiang Yue: xyue2@andrew.cmu.edu ## Citation If you find this work helpful, please cite out paper: ```` @misc{liu2024harnessingwebpageuistextrich, title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, year={2024}, eprint={2410.13824}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.13824}, } ````

#### 用于论文《面向富文本视觉理解的网页用户界面利用》（Harnessing Webpage UIs for Text-Rich Visual Understanding）的Mind2Web训练集 🌐 [主页](https://neulab.github.io/MultiUI/) | 🐍 [GitHub](https://github.com/neulab/multiui) | 📖 [arXiv](https://arxiv.org/abs/2410.13824) ## 引言我们提出**MultiUI**数据集，该数据集包含来自100万个网站的730万个样本，涵盖多样化的多模态任务与用户界面（User Interface, UI）布局。在**MultiUI**上训练的模型不仅在网页UI任务中表现优异——在VisualWebBench基准测试中性能最高提升48%，在AI智能体（AI Agent）网页数据集Mind2Web上的动作准确率提升19.1%——还能出色泛化至非网页UI任务，乃至非UI领域，例如文档理解、光学字符识别（Optical Character Recognition, OCR）与图表解读。 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video> ## 联系方式 * 刘俊鹏：jpliu@link.cuhk.edu.hk * 岳翔：xyue2@andrew.cmu.edu ## 引用若您认为本工作对您有所帮助，请引用本文： ` @misc{liu2024harnessingwebpageuistextrich, title={Harnessing Webpage UIs for Text-Rich Visual Understanding}, author={Junpeng Liu and Tianyue Ou and Yifan Song and Yuxiao Qu and Wai Lam and Chenyan Xiong and Wenhu Chen and Graham Neubig and Xiang Yue}, year={2024}, eprint={2410.13824}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.13824}, } `

提供机构：

maas

创建时间：

2025-10-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集