five

MSNER

收藏
arXiv2024-05-19 更新2024-06-21 收录
下载链接:
https://github.com/qmeeus/MSNER
下载链接
链接失效反馈
官方服务:
资源简介:
MSNER数据集是由鲁汶大学的LIIR实验室和电气工程系合作创建的,旨在推动多语言语音命名实体识别的研究。该数据集基于VoxPopuli数据集,涵盖荷兰语、法语、德语和西班牙语四种语言,包含约590小时的训练和验证数据以及17小时的评估数据。数据集的创建过程包括使用自动语音识别系统生成初步标注,随后通过人工校正以提高标注质量。MSNER数据集的应用领域主要集中在语音处理和自然语言处理,特别是用于评估和改进多语言环境下的命名实体识别技术。

The MSNER dataset was collaboratively developed by the LIIR Laboratory and the Department of Electrical Engineering of KU Leuven, with the goal of advancing research on multilingual spoken named entity recognition. Built upon the VoxPopuli dataset, this dataset covers four languages: Dutch, French, German, and Spanish, and contains approximately 590 hours of combined training and validation data, as well as 17 hours of evaluation data. The dataset's construction process includes generating preliminary annotations via automatic speech recognition (ASR) systems, followed by manual correction to improve annotation quality. The MSNER dataset is primarily applied in the fields of speech processing and natural language processing (NLP), particularly for evaluating and enhancing named entity recognition technologies in multilingual contexts.
提供机构:
计算机科学系,鲁汶大学
创建时间:
2024-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作