mFollowIR-cross-lingual-parquet

Name: mFollowIR-cross-lingual-parquet
Creator: maas
Published: 2025-09-16 16:47:24
License: 暂无描述

魔搭社区2025-09-16 更新2025-09-13 收录

下载链接：

https://modelscope.cn/datasets/jhu-clsp/mFollowIR-cross-lingual-parquet

下载链接

链接失效反馈

官方服务：

资源简介：

# mFollowIR-cross-lingual-parquet This is a parquet version of the mFollowIR cross-lingual dataset that can be loaded directly with `load_dataset()`. The original dataset can be found at [jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual). ## Dataset Structure The dataset contains the following configurations for each target language (fas, rus, zho): ### Configurations - `qrels_og_[lang]`: Original relevance judgments (test split) - `qrels_changed_[lang]`: Modified relevance judgments (test split) - `corpus_[lang]`: Document collection - `queries_[lang]`: Query set with instructions - `top_ranked_[lang]`: Top ranked documents ## Usage ```python from datasets import load_dataset # Load a specific configuration dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # or any other config # Load multiple configurations dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", ["queries_fas", "corpus_fas"]) ``` ## Citation ```bibtex @article{weller2024mfollowir, title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval", author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn", journal="arXiv preprint TODO", year="2024" } ```

# mFollowIR-cross-lingual-parquet 本数据集为mFollowIR跨语言数据集的Parquet（帕科列存储文件格式）版本，可直接通过`load_dataset()`函数加载。原始数据集可在[https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual)获取。 ## 数据集结构本数据集针对以下目标语言（波斯语`fas`、俄语`rus`、中文`zho`）提供如下配置项： ### 配置项 - `qrels_og_[lang]`：原始相关性标注（测试集划分） - `qrels_changed_[lang]`：修改后的相关性标注（测试集划分） - `corpus_[lang]`：文档集合 - `queries_[lang]`：带有检索指令的查询集合 - `top_ranked_[lang]`：Top排序文档 ## 使用方法 python from datasets import load_dataset # 加载指定配置项 dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # 或其他任意配置项 # 加载多个配置项 dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", ["queries_fas", "corpus_fas"]) ## 引用格式 bibtex @article{weller2024mfollowir, title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval", author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn", journal="arXiv preprint TODO", year="2024" }

提供机构：

maas

创建时间：

2025-09-10

搜集汇总

数据集介绍