five

NOISEBENCH

收藏
arXiv2024-05-13 更新2024-06-21 收录
下载链接:
https://github.com/elenamer/NoiseBench
下载链接
链接失效反馈
官方服务:
资源简介:
NOISEBENCH是一个用于评估命名实体识别(NER)中真实标签噪声影响的基准数据集,由柏林洪堡大学创建。该数据集包含7种不同的噪声变体,每种变体都包含相同的句子,但受到一种真实错误类型的影响,如专家、众包工作者、远监督、弱监督和教师LLM引起的错误。NOISEBENCH旨在解决现有模型在处理真实噪声时遇到的挑战,特别是在标签噪声由人类错误或半自动标注引起的情况下。数据集的应用领域包括评估和改进NER模型在面对各种真实噪声时的鲁棒性,以及探索不同类型噪声对模型性能的影响。

NOISEBENCH is a benchmark dataset developed by Humboldt-Universität zu Berlin for evaluating the impact of real-world label noise on named entity recognition (NER). This dataset contains 7 distinct noise variants, each consisting of identical sentences corrupted by one specific type of real-world error, such as errors induced by experts, crowd workers, distant supervision, weak supervision, and teacher LLMs. NOISEBENCH aims to address the challenges faced by existing models when handling real-world label noise, particularly in scenarios where label noise stems from human errors or semi-automated annotation. The application scenarios of this dataset include evaluating and improving the robustness of NER models against various real-world label noises, as well as exploring the impact of different noise types on model performance.
提供机构:
柏林洪堡大学
创建时间:
2024-05-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作