ybyby624/wiki-fixed
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ybyby624/wiki-fixed
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
---
This repo contains the fixed wiki corpus, which is based on the wiki-18 corpus and supplemented by the HotpotQA, 2WikiMultiHopQA, Musique datasets. Comparing with the original Wiki-18 corpus, this version contains 295,311 new documents which are critical for answering the question for the above three datasets.
We have already embedded the corpus with the Qwen3-8B-Embedding. As the Qwen3-8B-Embedding is trained with the MRL, we just provide the 4096-dimension version. If you are limited by the computing resource, please just run
```
python convert_faiss_dim.py --input {your_faiss_manifest_path} --output {your_target_file_name_without_postfix} --target_dim {any_dim_between_32_and_4096} --normalize
```
Then you can launch the search service with the converted index along with the original corpus. To enable the Qwen3-8B-Embedding's MRL capability, you should launch the vllm server with the config ----hf-overrides {"is_matryoshka": true}, and send request with the dimension argument.
If you find this repo helpful, please temporarily cite this repo. We will update this card when the paper is released.
提供机构:
ybyby624



