PMC-Patients ReCDS Benchmark

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://figshare.com/articles/dataset/PMC-Patients_ReCDS_Benchmark/24504121

下载链接

链接失效反馈

官方服务：

资源简介：

## PMC-Patients ReCDS Benchmark The PMC-Patients ReCDS benchmark is presented as retrieval tasks and the data format is the same as [BEIR](https://github.com/beir-cellar/beir) benchmark. To be specific, there are queries, corpus, and qrels (annotations). ### Queries ReCDS-PAR and ReCDS-PPR tasks share the same query patient set and dataset split. For each split (train, dev, and test), queries are stored a `jsonl` file that contains a list of dictionaries, each with two fields: - `_id`: unique query identifier represented by patient_uid. - `text`: query text represented by patient summary text. ### Corpus Corpus is shared by different splits. For ReCDS-PAR, the corpus contains 11.7M PubMed articles, and for ReCDS-PPR, the corpus contains 155.2k reference patients from PMC-Patients. The corpus is also presented by a `jsonl` file that contains a list of dictionaries with three fields: - `_id`: unique document identifier represented by PMID of the PubMed article in ReCDS-PAR, and patient_uid of the candidate patient in ReCDS-PPR. - `title`: : title of the article in ReCDS-PAR, and empty string in ReCDS-PPR. - `text`: abstract of the article in ReCDS-PAR, and patient summary text in ReCDS-PPR. ### Qrels Qrels are TREC-style retrieval annotation files in `tsv` format. A qrels file contains three tab-separated columns, i.e. the query identifier, corpus identifier, and score in this order. The scores (2 or 1) indicate the relevance level in ReCDS-PAR or similarity level in ReCDS-PPR. Note that the qrels may not be the same as `relevant_articles` and `similar_patients` in `PMC-Patients.json` due to dataset split (see our manuscript for details).

创建时间：

2023-11-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集