five

nanobeir-multilingual

收藏
魔搭社区2025-12-04 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/lightonai/nanobeir-multilingual
下载链接
链接失效反馈
官方服务:
资源简介:
This multilingual collection is derived from the original English NanoBEIR datasets, which are smaller versions of BEIR datasets. The compact size of these datasets makes them ideal for conducting quick and efficient evaluations during training. To facilitate broader research in cross-lingual information retrieval, our dataset has been machine-translated from the original English into eight additional languages: Arabic (ar), German (de), Spanish (es), French (fr), Italian (it), Norwegian (no), Portuguese (pt), and Swedish (sv). The original dataset is available at [zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir-66e1a0af21dfd93e620cd9f6). ```python from datasets import load_dataset languages = [ "ar", "de", "en", "es", "fr", "it", "no", "pt", "sv", ] datasets = [ "NanoArguAna", "NanoClimateFEVER", "NanoDBPedia", "NanoFEVER", "NanoFiQA2018", "NanoHotpotQA", "NanoMSMARCO", "NanoNFCorpus", "NanoNQ", "NanoQuoraRetrieval", "NanoSCIDOCS", "NanoSciFact", "NanoTouche2020", ] language = "fr" corpus = load_dataset( "lightonai/nanobeir-multilingual", f"NanoQuoraRetrieval_{language}", split="corpus" ) queries = load_dataset( "lightonai/nanobeir-multilingual", f"NanoQuoraRetrieval_{language}", split="queries" ) qrels = load_dataset( "lightonai/nanobeir-multilingual", "NanoQuoraRetrieval", split="qrels" ) ``` ``` @misc{nanobeir-multilingual, author = {Sourty, Raphaël}, title = {NanoBeir-Multilingual: Multilingual version of NanoBeir for quick evaluation.}, year = {2025}, url = {https://huggingface.co/datasets/lightonai/nanobeir-multilingual} } ```

本多语言数据集集合源自原版英文NanoBEIR数据集,该数据集是BEIR数据集的轻量化版本。此类数据集体积小巧,非常适合在模型训练阶段开展快速高效的评估工作。为推动跨语言信息检索领域的更广泛研究,本数据集通过机器翻译将原版英文内容拓展至8种额外语言:阿拉伯语(ar)、德语(de)、西班牙语(es)、法语(fr)、意大利语(it)、挪威语(no)、葡萄牙语(pt)以及瑞典语(sv)。原版数据集可于[zeta-alpha-ai](https://huggingface.co/collections/zeta-alpha-ai/nanobeir-66e1a0af21dfd93e620cd9f6)获取。 python from datasets import load_dataset languages = [ "ar", "de", "en", "es", "fr", "it", "no", "pt", "sv", ] datasets = [ "NanoArguAna", "NanoClimateFEVER", "NanoDBPedia", "NanoFEVER", "NanoFiQA2018", "NanoHotpotQA", "NanoMSMARCO", "NanoNFCorpus", "NanoNQ", "NanoQuoraRetrieval", "NanoSCIDOCS", "NanoSciFact", "NanoTouche2020", ] language = "fr" corpus = load_dataset( "lightonai/nanobeir-multilingual", f"NanoQuoraRetrieval_{language}", split="corpus" ) queries = load_dataset( "lightonai/nanobeir-multilingual", f"NanoQuoraRetrieval_{language}", split="queries" ) qrels = load_dataset( "lightonai/nanobeir-multilingual", "NanoQuoraRetrieval", split="qrels" ) @misc{nanobeir-multilingual, author = {Sourty, Raphaël}, title = {NanoBeir多语言版:用于快速评估的NanoBeir多语言版本}, year = {2025}, url = {https://huggingface.co/datasets/lightonai/nanobeir-multilingual} }
提供机构:
maas
创建时间:
2025-09-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作