VisRAG-Ret-Test-MP-DocVQA
收藏魔搭社区2025-12-18 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/VisRAG-Ret-Test-MP-DocVQA
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description
This is a VQA dataset based on Industrial Documents from MP-DocVQA dataset from [MP-DocVQA](https://www.docvqa.org/datasets/docvqa).
### Load the dataset
```python
from datasets import load_dataset
import csv
def load_beir_qrels(qrels_file):
qrels = {}
with open(qrels_file) as f:
tsvreader = csv.DictReader(f, delimiter="\t")
for row in tsvreader:
qid = row["query-id"]
pid = row["corpus-id"]
rel = int(row["score"])
if qid in qrels:
qrels[qid][pid] = rel
else:
qrels[qid] = {pid: rel}
return qrels
corpus_ds = load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", name="corpus", split="train")
queries_ds = load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", name="queries", split="train")
qrels_path = "xxxx" # path to qrels file which can be found under qrels folder in the repo.
qrels = load_beir_qrels(qrels_path)
```
### 数据集说明
本数据集为基于工业文档的视觉问答(Visual Question Answering, VQA)数据集,源自公开的MP-DocVQA数据集,相关详情可参见[MP-DocVQA](https://www.docvqa.org/datasets/docvqa)。
### 数据集加载代码
python
from datasets import load_dataset # 导入数据集加载工具库
import csv # 导入CSV(Comma-Separated Values, CSV)库
def load_beir_qrels(qrels_file):
qrels = {} # 初始化用于存储查询-文档相关度标注的字典
with open(qrels_file) as f:
tsvreader = csv.DictReader(f, delimiter=" ") # 使用TSV(Tab-Separated Values, TSV)字典阅读器读取文件
for row in tsvreader:
qid = row["query-id"] # 获取当前行的查询ID
pid = row["corpus-id"] # 获取当前行的语料库文档ID
rel = int(row["score"]) # 将当前行的相关度分值转换为整数类型
if qid in qrels:
qrels[qid][pid] = rel # 若查询ID已存在,则追加对应文档的相关度标注
else:
qrels[qid] = {pid: rel} # 若查询ID未存在,则创建新的查询条目
return qrels # 返回构建完成的qrels字典
corpus_ds = load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", name="corpus", split="train")
# 加载语料库数据集,指定数据集子模块名称为corpus,划分数据集为训练集
queries_ds = load_dataset("openbmb/VisRAG-Ret-Test-MP-DocVQA", name="queries", split="train")
# 加载查询数据集,指定数据集子模块名称为queries,划分数据集为训练集
qrels_path = "xxxx" # qrels文件的存储路径,该文件可在本仓库的qrels文件夹下获取
qrels = load_beir_qrels(qrels_path)
提供机构:
maas
创建时间:
2025-05-15



