PMC-Patients ReCDS Benchmark
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/PMC-Patients_ReCDS_Benchmark/24504121/1
下载链接
链接失效反馈官方服务:
资源简介:
<pre>## PMC-Patients ReCDS Benchmark<br><br>The PMC-Patients ReCDS benchmark is presented as retrieval tasks and the data format is the same as [BEIR](https://github.com/beir-cellar/beir) benchmark. <br>To be specific, there are queries, corpus, and qrels (annotations).<br><br>### Queries<br><br>ReCDS-PAR and ReCDS-PPR tasks share the same query patient set and dataset split.<br>For each split (train, dev, and test), queries are stored a `jsonl` file that contains a list of dictionaries, each with two fields: <br>- `_id`: unique query identifier represented by patient_uid.<br>- `text`: query text represented by patient summary text.<br><br>### Corpus<br><br>Corpus is shared by different splits. For ReCDS-PAR, the corpus contains 11.7M PubMed articles, and for ReCDS-PPR, the corpus contains 155.2k reference patients from PMC-Patients. The corpus is also presented by a `jsonl` file that contains a list of dictionaries with three fields:<br>- `_id`: unique document identifier represented by PMID of the PubMed article in ReCDS-PAR, and patient_uid of the candidate patient in ReCDS-PPR.<br>- `title`: : title of the article in ReCDS-PAR, and empty string in ReCDS-PPR.<br>- `text`: abstract of the article in ReCDS-PAR, and patient summary text in ReCDS-PPR.<br><br>### Qrels<br><br>Qrels are TREC-style retrieval annotation files in `tsv` format.<br>A qrels file contains three tab-separated columns, i.e. the query identifier, corpus identifier, and score in this order. The scores (2 or 1) indicate the relevance level in ReCDS-PAR or similarity level in ReCDS-PPR.<br><br>Note that the qrels may not be the same as `relevant_articles` and `similar_patients` in `PMC-Patients.json` due to dataset split (see our manuscript for details).</pre>
提供机构:
figshare
创建时间:
2023-11-06



