EVisRAG-Test-DocVQA
收藏魔搭社区2026-01-06 更新2025-12-20 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/EVisRAG-Test-DocVQA
下载链接
链接失效反馈官方服务:
资源简介:
**Dataset Description**
This is a VQA dataset on Document Images from [DocVQA](https://arxiv.org/abs/2007.00398).
**Load the dataset**
```python
import pandas as pd
import os
import sys
data_name = sys.argv[1]
df = pd.read_parquet(f"data/{data_name}/images.parquet", engine="pyarrow")
output_dir = f"data/{data_name}"
os.makedirs(f"{output_dir}/imgs", exist_ok=True)
for idx, row in df.iterrows():
img_bytes = row['image']['bytes']
output_path = os.path.join(output_dir, row["path"])
with open(output_path, "wb") as f:
f.write(img_bytes)
```
**数据集描述**
本数据集为源自[DocVQA](https://arxiv.org/abs/2007.00398)的文档图像视觉问答(Visual Question Answering, VQA)数据集。
**数据集加载**
python
import pandas as pd
import os
import sys
data_name = sys.argv[1]
df = pd.read_parquet(f"data/{data_name}/images.parquet", engine="pyarrow")
output_dir = f"data/{data_name}"
os.makedirs(f"{output_dir}/imgs", exist_ok=True)
for idx, row in df.iterrows():
img_bytes = row['image']['bytes']
output_path = os.path.join(output_dir, row["path"])
with open(output_path, "wb") as f:
f.write(img_bytes)
提供机构:
maas
创建时间:
2025-12-08



