NOISEBENCH

Name: NOISEBENCH
Creator: 柏林洪堡大学
Published: 2024-05-13 18:20:31
License: 暂无描述

arXiv2024-05-13 更新2024-06-21 收录

下载链接：

https://github.com/elenamer/NoiseBench

下载链接

链接失效反馈

官方服务：

资源简介：

NOISEBENCH是一个用于评估命名实体识别（NER）中真实标签噪声影响的基准数据集，由柏林洪堡大学创建。该数据集包含7种不同的噪声变体，每种变体都包含相同的句子，但受到一种真实错误类型的影响，如专家、众包工作者、远监督、弱监督和教师LLM引起的错误。NOISEBENCH旨在解决现有模型在处理真实噪声时遇到的挑战，特别是在标签噪声由人类错误或半自动标注引起的情况下。数据集的应用领域包括评估和改进NER模型在面对各种真实噪声时的鲁棒性，以及探索不同类型噪声对模型性能的影响。

NOISEBENCH is a benchmark dataset developed by Humboldt-Universität zu Berlin for evaluating the impact of real-world label noise on named entity recognition (NER). This dataset contains 7 distinct noise variants, each consisting of identical sentences corrupted by one specific type of real-world error, such as errors induced by experts, crowd workers, distant supervision, weak supervision, and teacher LLMs. NOISEBENCH aims to address the challenges faced by existing models when handling real-world label noise, particularly in scenarios where label noise stems from human errors or semi-automated annotation. The application scenarios of this dataset include evaluating and improving the robustness of NER models against various real-world label noises, as well as exploring the impact of different noise types on model performance.

提供机构：

柏林洪堡大学

创建时间：

2024-05-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集