five

PMC-Patients ReCDS Benchmark

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/PMC-Patients_ReCDS_Benchmark/24504121
下载链接
链接失效反馈
官方服务:
资源简介:
## PMC-Patients ReCDS Benchmark The PMC-Patients ReCDS benchmark is presented as retrieval tasks and the data format is the same as [BEIR](https://github.com/beir-cellar/beir) benchmark. To be specific, there are queries, corpus, and qrels (annotations). ### Queries ReCDS-PAR and ReCDS-PPR tasks share the same query patient set and dataset split. For each split (train, dev, and test), queries are stored a `jsonl` file that contains a list of dictionaries, each with two fields: - `_id`: unique query identifier represented by patient_uid. - `text`: query text represented by patient summary text. ### Corpus Corpus is shared by different splits. For ReCDS-PAR, the corpus contains 11.7M PubMed articles, and for ReCDS-PPR, the corpus contains 155.2k reference patients from PMC-Patients. The corpus is also presented by a `jsonl` file that contains a list of dictionaries with three fields: - `_id`: unique document identifier represented by PMID of the PubMed article in ReCDS-PAR, and patient_uid of the candidate patient in ReCDS-PPR. - `title`: : title of the article in ReCDS-PAR, and empty string in ReCDS-PPR. - `text`: abstract of the article in ReCDS-PAR, and patient summary text in ReCDS-PPR. ### Qrels Qrels are TREC-style retrieval annotation files in `tsv` format. A qrels file contains three tab-separated columns, i.e. the query identifier, corpus identifier, and score in this order. The scores (2 or 1) indicate the relevance level in ReCDS-PAR or similarity level in ReCDS-PPR. Note that the qrels may not be the same as `relevant_articles` and `similar_patients` in `PMC-Patients.json` due to dataset split (see our manuscript for details).
创建时间:
2023-11-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作