jinaai/arxivqa
收藏Hugging Face2025-08-19 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/jinaai/arxivqa
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: query
dtype: string
- name: image_filename
dtype: string
- name: image
dtype: image
- name: text_description
dtype: string
splits:
- name: test
num_bytes: 90158921.0
num_examples: 499
download_size: 76874997
dataset_size: 90158921.0
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
---
# Creation
This dataset is build upon the corresponding dataset from the [ViDoRe Benchmark](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d). For more information regarding the filtering please read [our paper](https://arxiv.org/abs/2506.18902) or [this discussion on github](https://github.com/embeddings-benchmark/mteb/pull/2942#discussion_r2240711654).
# Disclaimer
This dataset may contain publicly available images or text data. All data is provided for research and educational purposes only. If you are the rights holder of any content and have concerns regarding intellectual property or copyright, please contact us at "support-data (at) jina.ai" for removal. We do not collect or process personal, sensitive, or private information intentionally. If you believe this dataset includes such content (e.g., portraits, location-linked images, medical or financial data, or NSFW content), please notify us, and we will take appropriate action.
# Copyright
All rights are reserved to the original authors of the documents.
数据集信息:
特征字段:
- 名称:query,数据类型:字符串
- 名称:image_filename,数据类型:字符串
- 名称:image,数据类型:图像
- 名称:text_description,数据类型:字符串
数据拆分:
- 名称:test,字节数:90158921.0,样本量:499
下载大小:76874997
数据集总大小:90158921.0
数据集配置:
- 配置名称:default
数据文件:
- 拆分:test
路径:data/test-*
# 数据集构建说明
本数据集基于[ViDoRe基准测试集(ViDoRe Benchmark)]的对应数据集构建。如需了解更多筛选细节,请参阅[我们的论文](https://arxiv.org/abs/2506.18902)或[此GitHub讨论帖](https://github.com/embeddings-benchmark/mteb/pull/2942#discussion_r2240711654)。
# 免责声明
本数据集可能包含公开可用的图像或文本数据,所有数据仅用于研究与教育目的。若您是任一内容的权利持有人,且对相关知识产权或版权存在疑虑,请通过邮箱"support-data (at) jina.ai"联系我们以申请移除。我们不会主动收集或处理个人、敏感或隐私信息。若您认为本数据集包含此类内容(例如肖像照、关联地理位置的图像、医疗或金融数据,或NSFW内容),请告知我们,我们将采取相应处理措施。
# 版权声明
所有权利归属于原文档的原作者。
提供机构:
jinaai



