five

Hyukkyu/beir-trec-covid

收藏
Hugging Face2025-11-25 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Hyukkyu/beir-trec-covid
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - information-retrieval - text-retrieval tags: - beir - trec-covid - information-retrieval - retrieval - search dataset_info: - config_name: corpus features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string - name: metadata struct: - name: pubmed_id dtype: string - name: url dtype: string splits: - name: train num_bytes: 210286181 num_examples: 171332 download_size: 115532524 dataset_size: 210286181 - config_name: queries features: - name: _id dtype: string - name: text dtype: string - name: metadata struct: - name: narrative dtype: string - name: query dtype: string splits: - name: train num_bytes: 13952 num_examples: 50 download_size: 11365 dataset_size: 13952 configs: - config_name: corpus data_files: - split: train path: corpus/train-* - config_name: queries data_files: - split: train path: queries/train-* --- # BEIR TREC-COVID Dataset (Migrated) This is a migrated version of BeIR/trec-covid that is compatible with datasets library 4.0.0+. ## Dataset Description This dataset contains the trec-covid dataset from the BEIR benchmark, converted from the old script-based format to Parquet format. ## Dataset Structure ### Queries - **Split 'queries'**: 50 examples - Features: ['_id', 'text', 'metadata'] - **Total examples**: 50 ### Corpus - **Split 'corpus'**: 171,332 examples - Features: ['_id', 'title', 'text', 'metadata'] - **Total examples**: 171,332 ## Usage ```python from datasets import load_dataset # Load queries (split: queries) queries = load_dataset("Hyukkyu/beir-trec-covid", "queries", split="queries") # Load corpus (split: corpus) corpus = load_dataset("Hyukkyu/beir-trec-covid", "corpus", split="corpus") ``` ## Available Splits ### Queries - `queries`: 50 examples ### Corpus - `corpus`: 171,332 examples ## Original Dataset This dataset is migrated from: BeIR/trec-covid ## Citation If you use this dataset, please cite the original BEIR paper: ```bibtex @article{thakur2021beir, title={BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models}, author={Thakur, Nandan and Reimers, Nils and Ruckle, Andreas and Srivastava, Abhishek and Gurevych, Iryna}, journal={arXiv preprint arXiv:2104.08663}, year={2021} } ```
提供机构:
Hyukkyu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作