five

Poesi.as dataset

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5718260
下载链接
链接失效反馈
官方服务:
资源简介:
Collection of poems, mostly Spanish, from the 21th century and before Some stats: Number of poems: 25.187 Number of words: 7.918.679 Two jsons are provided: poesias_corpora.json: This is the json used to generate the txt files. poesias_corpora_old_spanish.json: This json is still a work in progress. It has old Spanish poems made mostly by Alfonso X and they are not included in the corpora folder. An additional CSV file, authors.csv, provides reconciled information for authors of the 20th Century and below. Identifiers (VIAF, BnF, BNE, LoC, ISNI), dates of birth and death, and gender, are also added as they appear in Wikidata. This repo is a dump of the website www.poesi.as, we do not own the rights of any of the works pulished here. For any violations or infringement of copyright, take proper action within the scope of the original website. Public Domain The script extract.py generates a public domain corpus in JSON extracted from the corpus in poesi.as. The number of years since the death of an author needed for a work to be considered in the public domain can be specified using -y YEARS (--years YEARS). Defaults to 80 as per Spanish copyright laws. ` $ python extract.py > public_domain.json
创建时间:
2021-11-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作