PaDaS-Lab/webfaq-v2
收藏Hugging Face2026-04-27 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/PaDaS-Lab/webfaq-v2
下载链接
链接失效反馈官方服务:
资源简介:
这是一个多语言问答检索数据集,涵盖超过100种语言,包括南非荷兰语、阿姆哈拉语、阿拉伯语、中文等。数据集主要用于文本检索任务,特别是文档检索。每个数据条目包含ID、来源、URL、问题、答案、语义相似度评分,部分语言还包含主题和问题类型等特征。数据规模庞大,例如英语配置有超过5500万个示例,其他语言也有从几百到数百万不等的示例。该数据集适用于训练和评估多语言检索模型,支持跨语言信息检索研究。
This is a multilingual question-answer retrieval dataset covering over 100 languages, including Afrikaans, Amharic, Arabic, Chinese, and more. The dataset is primarily designed for text-retrieval tasks, specifically document retrieval. Each data entry includes features such as ID, origin, URL, question, answer, semantic similarity score, and for some languages, additional features like topic and question type. The dataset is large-scale, with the English configuration containing over 55 million examples, and other languages ranging from hundreds to millions of examples. It is suitable for training and evaluating multilingual retrieval models, supporting cross-lingual information retrieval research.
提供机构:
PaDaS-Lab



