five

ybyby624/wiki-fixed

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ybyby624/wiki-fixed
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en --- This repo contains the fixed wiki corpus, which is based on the wiki-18 corpus and supplemented by the HotpotQA, 2WikiMultiHopQA, Musique datasets. Comparing with the original Wiki-18 corpus, this version contains 295,311 new documents which are critical for answering the question for the above three datasets. We have already embedded the corpus with the Qwen3-8B-Embedding. As the Qwen3-8B-Embedding is trained with the MRL, we just provide the 4096-dimension version. If you are limited by the computing resource, please just run ``` python convert_faiss_dim.py --input {your_faiss_manifest_path} --output {your_target_file_name_without_postfix} --target_dim {any_dim_between_32_and_4096} --normalize ``` Then you can launch the search service with the converted index along with the original corpus. To enable the Qwen3-8B-Embedding's MRL capability, you should launch the vllm server with the config ----hf-overrides {"is_matryoshka": true}, and send request with the dimension argument. If you find this repo helpful, please temporarily cite this repo. We will update this card when the paper is released.
提供机构:
ybyby624
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作