EVisRAG-Train
收藏魔搭社区2026-01-06 更新2025-12-20 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/EVisRAG-Train
下载链接
链接失效反馈官方服务:
资源简介:
paper: [2510.09733](https://arxiv.org/abs/2510.09733)
**Dataset Description**
This is a VQA Training dataset, collected from [ChartQA](https://arxiv.org/abs/2203.10244), [InfographicVQA](https://arxiv.org/abs/2104.12756), and [MMLongBench-Doc](https://arxiv.org/abs/2407.01523).
**Load the dataset**
```python
import pandas as pd
import os
import sys
data_name = sys.argv[1]
df = pd.read_parquet(f"data/{data_name}/images.parquet", engine="pyarrow")
output_dir = f"data/{data_name}"
os.makedirs(f"{output_dir}/imgs", exist_ok=True)
for idx, row in df.iterrows():
img_bytes = row['image']['bytes']
output_path = os.path.join(output_dir, row["path"])
with open(output_path, "wb") as f:
f.write(img_bytes)
```
论文: [2510.09733](https://arxiv.org/abs/2510.09733)
**数据集说明**
本数据集为视觉问答(Visual Question Answering, VQA)训练数据集,采集自[ChartQA](https://arxiv.org/abs/2203.10244)、[InfographicVQA](https://arxiv.org/abs/2104.12756)以及[MMLongBench-Doc](https://arxiv.org/abs/2407.01523)。
**数据集加载方式**
python
import pandas as pd
import os
import sys
data_name = sys.argv[1]
df = pd.read_parquet(f"data/{data_name}/images.parquet", engine="pyarrow")
output_dir = f"data/{data_name}"
os.makedirs(f"{output_dir}/imgs", exist_ok=True)
for idx, row in df.iterrows():
img_bytes = row['image']['bytes']
output_path = os.path.join(output_dir, row["path"])
with open(output_path, "wb") as f:
f.write(img_bytes)
提供机构:
maas
创建时间:
2025-12-08



