Ugiat/ner-cat
收藏Hugging Face2025-03-19 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Ugiat/ner-cat
下载链接
链接失效反馈官方服务:
资源简介:
NERCat数据集是一个手动标注的加泰罗尼亚语电视转录数据集,旨在提高加泰罗尼亚语的命名实体识别性能。该数据集包含9,242个句子和13,732个命名实体的标注,涵盖八个类别,包括人名、设施、组织、地点、产品、事件、日期和法律。数据集用于解决加泰罗尼亚语缺乏高质量标注数据的问题,并支持在媒体、治理和文化领域开发自然语言处理应用。数据集的结构与GLiNER框架兼容,并提供了JSON格式的数据实例示例。
The NERCat dataset is a manually annotated collection of Catalan-language television transcriptions designed to improve Named Entity Recognition (NER) performance for the Catalan language. The dataset includes 9,242 sentences and 13,732 named entities annotated across eight categories: Person, Facility, Organization, Location, Product, Event, Date, and Law. It aims to address the lack of high-quality annotated data for Catalan and supports the development of NLP applications in Catalan media, governance, and cultural domains. The dataset structure is compatible with the GLiNER framework, and JSON-formatted instance examples are provided.
提供机构:
Ugiat



