five

mFollowIR-cross-lingual-parquet

收藏
魔搭社区2025-09-16 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/jhu-clsp/mFollowIR-cross-lingual-parquet
下载链接
链接失效反馈
官方服务:
资源简介:
# mFollowIR-cross-lingual-parquet This is a parquet version of the mFollowIR cross-lingual dataset that can be loaded directly with `load_dataset()`. The original dataset can be found at [jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual). ## Dataset Structure The dataset contains the following configurations for each target language (fas, rus, zho): ### Configurations - `qrels_og_[lang]`: Original relevance judgments (test split) - `qrels_changed_[lang]`: Modified relevance judgments (test split) - `corpus_[lang]`: Document collection - `queries_[lang]`: Query set with instructions - `top_ranked_[lang]`: Top ranked documents ## Usage ```python from datasets import load_dataset # Load a specific configuration dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # or any other config # Load multiple configurations dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", ["queries_fas", "corpus_fas"]) ``` ## Citation ```bibtex @article{weller2024mfollowir, title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval", author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn", journal="arXiv preprint TODO", year="2024" } ```

# mFollowIR-cross-lingual-parquet 本数据集为mFollowIR跨语言数据集的Parquet(帕科列存储文件格式)版本,可直接通过`load_dataset()`函数加载。原始数据集可在[https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual)获取。 ## 数据集结构 本数据集针对以下目标语言(波斯语`fas`、俄语`rus`、中文`zho`)提供如下配置项: ### 配置项 - `qrels_og_[lang]`:原始相关性标注(测试集划分) - `qrels_changed_[lang]`:修改后的相关性标注(测试集划分) - `corpus_[lang]`:文档集合 - `queries_[lang]`:带有检索指令的查询集合 - `top_ranked_[lang]`:Top排序文档 ## 使用方法 python from datasets import load_dataset # 加载指定配置项 dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # 或其他任意配置项 # 加载多个配置项 dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", ["queries_fas", "corpus_fas"]) ## 引用格式 bibtex @article{weller2024mfollowir, title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval", author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn", journal="arXiv preprint TODO", year="2024" }
提供机构:
maas
创建时间:
2025-09-10
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是mFollowIR跨语言信息检索基准的Parquet格式版本,支持波斯语、俄语和中文,包含原始和修改的相关性判断、文档集、查询集及顶级排名文档等多种配置。它可直接通过load_dataset()函数加载,便于使用。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作