RadNLI: A natural language inference dataset for the radiology domain

Name: RadNLI: A natural language inference dataset for the radiology domain
Creator: PhysioNet
Published: 2021-12-16 13:15:54
License: 暂无描述

DataCite Commons2021-12-16 更新2025-04-16 收录

下载链接：

https://physionet.org/content/radnli-report-inference/

下载链接

链接失效反馈

官方服务：

资源简介：

The problem of natural language inference (NLI) determines whether a natural language hypothesis can be justifiably inferred from a natural language premise. NLI has attracted researchers to benchmark it in a number of settings including medical ones. While NLI datasets such as the MedNLI dataset exist for the clinical domain, systems trained with them do not generalize well to applications that require understanding of radiology reports. We therefore introduce an NLI dataset in the radiology domain, in which NLI information is annotated on sentences drawn from radiology reports. Sentence pairs in our dataset are sampled from MIMIC-CXR and we annotated them with NLI labels by two experts: one medical expert and one computer science expert. Each pair is annotated twice, swapping its premise and hypothesis, resulting in 960 pairs. The set is then split in half, resulting in 480 pair for a validation set and 480 pairs for a test set. We confirmed that a BERT-based NLI model trained with a distant supervision approach can achieve the accuracy of 77.8% on this test set.

提供机构：

PhysioNet

创建时间：

2021-06-22