Project Gutenberg Self-Publishing Press: Portuguese Books
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7849562
下载链接
链接失效反馈官方服务:
资源简介:
A collection of 40 books written in Portuguese and published at Project Gutenberg Self-Publishing - eBooks | Read eBooks online | Free eBooks.
The gutenberg_selfpub_tagged.vrt file contains all the books. The texts were tagged using Spacy large trained model for Portuguese (https://spacy.io/models/pt)
The corpus metadata is listed in gutenberg_selfpub_metadata.tsv
gutenberg_selfpub_xml_untagged.zip contains all the untagged texts.
The XML files contain the following tags and attributes:
chapter: n
dedication
text:id
title
author
subtitle
part: n
acknowledge
summary: lang
publisher
short: n
biography
创建时间:
2023-04-26



