Hyukkyu/beir-nfcorpus
收藏Hugging Face2025-12-08 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Hyukkyu/beir-nfcorpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- information-retrieval
- text-retrieval
tags:
- beir
- nfcorpus
- information-retrieval
- retrieval
- search
configs:
- config_name: corpus
data_files:
- split: train
path: corpus/train-*
- config_name: queries
data_files:
- split: train
path: queries/train-*
dataset_info:
- config_name: corpus
features:
- name: title
dtype: string
- name: text
dtype: string
- name: _id
dtype: string
splits:
- name: train
num_bytes: 5856698
num_examples: 3633
download_size: 3178637
dataset_size: 5856698
- config_name: queries
features:
- name: title
dtype: string
- name: text
dtype: string
- name: _id
dtype: string
splits:
- name: train
num_bytes: 141303
num_examples: 3237
download_size: 80445
dataset_size: 141303
---
# BEIR NFCORPUS Dataset (Migrated)
This is a migrated version of BeIR/nfcorpus that is compatible with datasets library 4.0.0+.
## Dataset Description
This dataset contains the nfcorpus dataset from the BEIR benchmark, converted from the old script-based format to Parquet format.
## Dataset Structure
### Queries
- **Split 'queries'**: 3,237 examples
- Features: ['_id', 'title', 'text']
- **Total examples**: 3,237
### Corpus
- **Split 'corpus'**: 3,633 examples
- Features: ['_id', 'title', 'text']
- **Total examples**: 3,633
## Usage
```python
from datasets import load_dataset
# Load queries (split: queries)
queries = load_dataset("Hyukkyu/beir-nfcorpus", "queries", split="queries")
# Load corpus (split: corpus)
corpus = load_dataset("Hyukkyu/beir-nfcorpus", "corpus", split="corpus")
```
## Available Splits
### Queries
- `queries`: 3,237 examples
### Corpus
- `corpus`: 3,633 examples
## Original Dataset
This dataset is migrated from: BeIR/nfcorpus
## Citation
If you use this dataset, please cite the original BEIR paper:
```bibtex
@article{thakur2021beir,
title={BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models},
author={Thakur, Nandan and Reimers, Nils and Ruckle, Andreas and Srivastava, Abhishek and Gurevych, Iryna},
journal={arXiv preprint arXiv:2104.08663},
year={2021}
}
```
提供机构:
Hyukkyu



