five

PPORTAL_ner: An Annotated Corpus of Portuguese Literary Entities

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10855186
下载链接
链接失效反馈
官方服务:
资源简介:
PPORTAL_ner An Annotated Dataset of Portuguese Literary Entities The corpus is tailored to Brazilian and Portuguese literary texts, containing annotations for five entity categories, including PER, LOC, GPE, ORG, and DATE. Within a diverse collection of 25 literary works, it offers a total of 125,059 tokens and 5,266 annotated entities. This dataset contributes to the development of potentially more accurate and context-aware NER models, as well as to encourage further exploration within Portuguese literature. Corpus Statistics Our corpus is sourced from PPORTAL, an extensive repository of metadata containing over 80,000 public domain literary works in the Portuguese language, predominantly derived from Brazil and Portugal. PPORTAL aggregates data from three digital libraries: Domínio Público, Projecto Adamastor, and Biblioteca Digital de Literatura dos Países Lusófonos (BLPL). To simplify referencing, this new dataset is called PPORTAL_ner. PPORTAL_ner selection process contains a diverse range of 25 individual literary works, spanning different authors and literary styles. All of these texts were published prior to 1953, adhering to the current criteria for public domain status in Brazil, with the majority falling within the timeframe spanning from 1554 to 1938.
创建时间:
2024-03-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作