five

UGround-V1-Data-Box

收藏
魔搭社区2025-12-05 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/UGround-V1-Data-Box
下载链接
链接失效反馈
官方服务:
资源简介:
## Updates - **[May 1, 2025]** [**Bounding Box Data**](https://huggingface.co/datasets/osunlp/UGround-V1-Data-Box): We have added bounding box version of Web-Hybrid. For everyone's convenience, no conversation template is applied to this version of data. All the coordinates (x1, y1, x2, y2) are as always normalized to [0,999]. The data has also been filtered (757k datapoints after content moderation). ### Notes for Requests If you have applied for access to this dataset but have not received approval, please contact us via email (Boyu Gou) with your name, institution, and research purpose. Typically, requests will be approved within one day. ### Notes for Data This repo contains the two datasets mentioned in our paper: Web-Hybrid and Web-Direct. The former is the primary source of performance gains. For clarity, here is how we saved the data: ``` with open(item["image"], "rb") as img_file: img_bytes = img_file.read() record = { "width": item["width"], "height": item["height"], "conversations": json.dumps(item["conversations"], ensure_ascii=False), "image": img_bytes } ``` The coordinates have been processed into Qwen2-VL's format, i,e, [0,999]. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6500870f1e14749e84f8f887/ZYFehfgCfbjoHwzSAcZKp.png)

## 更新记录 - **[2025年5月1日]** [**边界框数据(Bounding Box Data)**](https://huggingface.co/datasets/osunlp/UGround-V1-Data-Box): 我们新增了Web-Hybrid的边界框版本。为便于广大用户使用,该版本数据未应用任何对话模板。所有坐标(x1, y1, x2, y2)均一如既往地归一化至[0, 999]区间。该数据集已完成内容审核过滤,最终共包含75.7万条数据点。 ### 申请须知 若您已提交该数据集的访问申请但未收到审批,请通过邮件联系顾博宇(Boyu Gou),并提供您的姓名、所属机构及研究目的。通常情况下,申请将在1个工作日内完成审批。 ### 数据集说明 本数据集仓库包含我们论文中提及的两个数据集:Web-Hybrid与Web-Direct。其中Web-Hybrid是实现性能提升的核心数据源。 为明确数据存储逻辑,我们采用如下方式存储数据: python with open(item["image"], "rb") as img_file: img_bytes = img_file.read() record = { "width": item["width"], "height": item["height"], "conversations": json.dumps(item["conversations"], ensure_ascii=False), "image": img_bytes } 坐标已处理为Qwen2-VL格式,即归一化至[0, 999]区间。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6500870f1e14749e84f8f887/ZYFehfgCfbjoHwzSAcZKp.png)
提供机构:
maas
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作