five

RadNLI: A natural language inference dataset for the radiology domain

收藏
DataCite Commons2021-12-16 更新2025-04-16 收录
下载链接:
https://physionet.org/content/radnli-report-inference/
下载链接
链接失效反馈
官方服务:
资源简介:
The problem of natural language inference (NLI) determines whether a natural language hypothesis can be justifiably inferred from a natural language premise. NLI has attracted researchers to benchmark it in a number of settings including medical ones. While NLI datasets such as the MedNLI dataset exist for the clinical domain, systems trained with them do not generalize well to applications that require understanding of radiology reports. We therefore introduce an NLI dataset in the radiology domain, in which NLI information is annotated on sentences drawn from radiology reports. Sentence pairs in our dataset are sampled from MIMIC-CXR and we annotated them with NLI labels by two experts: one medical expert and one computer science expert. Each pair is annotated twice, swapping its premise and hypothesis, resulting in 960 pairs. The set is then split in half, resulting in 480 pair for a validation set and 480 pairs for a test set. We confirmed that a BERT-based NLI model trained with a distant supervision approach can achieve the accuracy of 77.8% on this test set.
提供机构:
PhysioNet
创建时间:
2021-06-22
二维码
社区交流群
二维码
科研交流群
商业服务