five

Ugiat/ner-cat

收藏
Hugging Face2025-03-19 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Ugiat/ner-cat
下载链接
链接失效反馈
官方服务:
资源简介:
NERCat数据集是一个手动标注的加泰罗尼亚语电视转录数据集,旨在提高加泰罗尼亚语的命名实体识别性能。该数据集包含9,242个句子和13,732个命名实体的标注,涵盖八个类别,包括人名、设施、组织、地点、产品、事件、日期和法律。数据集用于解决加泰罗尼亚语缺乏高质量标注数据的问题,并支持在媒体、治理和文化领域开发自然语言处理应用。数据集的结构与GLiNER框架兼容,并提供了JSON格式的数据实例示例。

The NERCat dataset is a manually annotated collection of Catalan-language television transcriptions designed to improve Named Entity Recognition (NER) performance for the Catalan language. The dataset includes 9,242 sentences and 13,732 named entities annotated across eight categories: Person, Facility, Organization, Location, Product, Event, Date, and Law. It aims to address the lack of high-quality annotated data for Catalan and supports the development of NLP applications in Catalan media, governance, and cultural domains. The dataset structure is compatible with the GLiNER framework, and JSON-formatted instance examples are provided.
提供机构:
Ugiat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作