PMC-Patients ReCDS Benchmark
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/PMC-Patients_ReCDS_Benchmark/24504121
下载链接
链接失效反馈官方服务:
资源简介:
## PMC-Patients ReCDS Benchmark
The PMC-Patients ReCDS benchmark is presented as retrieval tasks and the data format is the same as [BEIR](https://github.com/beir-cellar/beir) benchmark.
To be specific, there are queries, corpus, and qrels (annotations).
### Queries
ReCDS-PAR and ReCDS-PPR tasks share the same query patient set and dataset split.
For each split (train, dev, and test), queries are stored a `jsonl` file that contains a list of dictionaries, each with two fields:
- `_id`: unique query identifier represented by patient_uid.
- `text`: query text represented by patient summary text.
### Corpus
Corpus is shared by different splits. For ReCDS-PAR, the corpus contains 11.7M PubMed articles, and for ReCDS-PPR, the corpus contains 155.2k reference patients from PMC-Patients. The corpus is also presented by a `jsonl` file that contains a list of dictionaries with three fields:
- `_id`: unique document identifier represented by PMID of the PubMed article in ReCDS-PAR, and patient_uid of the candidate patient in ReCDS-PPR.
- `title`: : title of the article in ReCDS-PAR, and empty string in ReCDS-PPR.
- `text`: abstract of the article in ReCDS-PAR, and patient summary text in ReCDS-PPR.
### Qrels
Qrels are TREC-style retrieval annotation files in `tsv` format.
A qrels file contains three tab-separated columns, i.e. the query identifier, corpus identifier, and score in this order. The scores (2 or 1) indicate the relevance level in ReCDS-PAR or similarity level in ReCDS-PPR.
Note that the qrels may not be the same as `relevant_articles` and `similar_patients` in `PMC-Patients.json` due to dataset split (see our manuscript for details).
创建时间:
2023-11-06



