mFollowIR-cross-lingual-parquet
收藏魔搭社区2025-09-16 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/jhu-clsp/mFollowIR-cross-lingual-parquet
下载链接
链接失效反馈官方服务:
资源简介:
# mFollowIR-cross-lingual-parquet
This is a parquet version of the mFollowIR cross-lingual dataset that can be loaded directly with `load_dataset()`. The original dataset can be found at [jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual).
## Dataset Structure
The dataset contains the following configurations for each target language (fas, rus, zho):
### Configurations
- `qrels_og_[lang]`: Original relevance judgments (test split)
- `qrels_changed_[lang]`: Modified relevance judgments (test split)
- `corpus_[lang]`: Document collection
- `queries_[lang]`: Query set with instructions
- `top_ranked_[lang]`: Top ranked documents
## Usage
```python
from datasets import load_dataset
# Load a specific configuration
dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # or any other config
# Load multiple configurations
dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet",
["queries_fas", "corpus_fas"])
```
## Citation
```bibtex
@article{weller2024mfollowir,
title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval",
author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn",
journal="arXiv preprint TODO",
year="2024"
}
```
# mFollowIR-cross-lingual-parquet
本数据集为mFollowIR跨语言数据集的Parquet(帕科列存储文件格式)版本,可直接通过`load_dataset()`函数加载。原始数据集可在[https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual](https://huggingface.co/datasets/jhu-clsp/mFollowIR-cross-lingual)获取。
## 数据集结构
本数据集针对以下目标语言(波斯语`fas`、俄语`rus`、中文`zho`)提供如下配置项:
### 配置项
- `qrels_og_[lang]`:原始相关性标注(测试集划分)
- `qrels_changed_[lang]`:修改后的相关性标注(测试集划分)
- `corpus_[lang]`:文档集合
- `queries_[lang]`:带有检索指令的查询集合
- `top_ranked_[lang]`:Top排序文档
## 使用方法
python
from datasets import load_dataset
# 加载指定配置项
dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", "queries_fas") # 或其他任意配置项
# 加载多个配置项
dataset = load_dataset("jhu-clsp/mFollowIR-cross-lingual-parquet", ["queries_fas", "corpus_fas"])
## 引用格式
bibtex
@article{weller2024mfollowir,
title="mFollowIR: a Multilingual Benchmark for Instruction Following in Information Retrieval",
author="Weller, Orion and Chang, Benjamin and Yang, Eugene and Yarmohammadi, Mahsa and Barham, Sam and MacAvaney, Sean and Cohan, Arman and Soldaini, Luca and Van Durme, Benjamin and Lawrie, Dawn",
journal="arXiv preprint TODO",
year="2024"
}
提供机构:
maas
创建时间:
2025-09-10
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是mFollowIR跨语言信息检索基准的Parquet格式版本,支持波斯语、俄语和中文,包含原始和修改的相关性判断、文档集、查询集及顶级排名文档等多种配置。它可直接通过load_dataset()函数加载,便于使用。
以上内容由遇见数据集搜集并总结生成



