WenxingZhu/vidore_v3_computer_science_embedding
收藏Hugging Face2025-11-12 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/WenxingZhu/vidore_v3_computer_science_embedding
下载链接
链接失效反馈官方服务:
资源简介:
ViDoRe V3: 计算机科学数据集是一个包含计算机科学教科书文本的语料库,适用于长文档理解任务。该数据集是ViDoRe v3基准测试的一部分,包含了来自openstacks网站的2本教科书,共有1360页。数据集包含1290个查询,平均每个查询涉及4.6页内容。数据集使用英语,并且提供了查询类型、格式和内容类型的统计信息。
The ViDoRe V3: Computer Science dataset is a corpus of computer science textbooks from the openstacks website, intended for long-document understanding tasks. It is part of the ViDoRe v3 benchmark and contains 2 textbooks with a total of 1360 pages. The dataset includes 1290 queries, with an average of 4.6 pages per query. The dataset is in English and provides statistics on query types, formats, and content types.
提供机构:
WenxingZhu



