five

Doc-750K

收藏
魔搭社区2026-01-06 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/OpenGVLab/Doc-750K
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is in the paper [Docopilot: Improving Multimodal Models for Document-Level Understanding](https://arxiv.org/abs/2507.14675). Please refer to https://github.com/OpenGVLab/Docopilot for details. ## FAQ ### Unzipping Split Archives on Linux If you encounter issues when unzipping the image archive on Linux, such as: - zip bomb warnings - bad zipfile offset errors Please try the following solutions: 1. Zip Bomb Warning Some systems may trigger a zip bomb detection warning due to the large number of small image files. You can bypass this by disabling the detection with: ```bash export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE ``` 2. Bad Zipfile Offset Error If you're dealing with split zip archives (e.g., images.z01, images.z02, ..., images.zip), you need to merge them before unzipping: ```bash zip -s 0 images.zip --out images_full.zip unzip images_full.zip ``` This will reconstruct the full archive and allow you to unzip it normally. Note: The image dataset is very large, so please ensure you have sufficient disk space and patience during extraction.

本数据集来自论文《Docopilot:面向文档级理解的多模态模型优化》(https://arxiv.org/abs/2507.14675)。 详细信息请参考https://github.com/OpenGVLab/Docopilot。 ### Linux系统下拆分压缩包的解压问题 若您在Linux系统中解压图像压缩包时遇到以下问题: - 压缩炸弹警告 - ZIP文件偏移错误 请尝试以下解决方案: 1. 压缩炸弹警告 部分系统会因包含大量小型图像文件而触发压缩炸弹检测警告,可通过以下命令关闭检测以绕过该问题: bash export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE 2. ZIP文件偏移错误 若您遇到的是拆分式ZIP压缩包(例如images.z01、images.z02……images.zip),则需先合并再解压: bash zip -s 0 images.zip --out images_full.zip unzip images_full.zip 该命令可重建完整压缩包,随后即可正常解压。 注意:本图像数据集体积较大,请确保拥有足够的磁盘空间,并在解压过程中保持耐心。
提供机构:
maas
创建时间:
2025-07-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作