RadNLI: A natural language inference dataset for the radiology domain
收藏DataCite Commons2021-12-16 更新2025-04-16 收录
下载链接:
https://physionet.org/content/radnli-report-inference/
下载链接
链接失效反馈官方服务:
资源简介:
The problem of natural language inference (NLI) determines whether a natural
language hypothesis can be justifiably inferred from a natural language
premise. NLI has attracted researchers to benchmark it in a number of settings
including medical ones. While NLI datasets such as the MedNLI dataset exist
for the clinical domain, systems trained with them do not generalize well to
applications that require understanding of radiology reports. We therefore
introduce an NLI dataset in the radiology domain, in which NLI information is
annotated on sentences drawn from radiology reports. Sentence pairs in our
dataset are sampled from MIMIC-CXR and we annotated them with NLI labels by
two experts: one medical expert and one computer science expert. Each pair is
annotated twice, swapping its premise and hypothesis, resulting in 960 pairs.
The set is then split in half, resulting in 480 pair for a validation set and
480 pairs for a test set. We confirmed that a BERT-based NLI model trained
with a distant supervision approach can achieve the accuracy of 77.8% on this
test set.
提供机构:
PhysioNet
创建时间:
2021-06-22



