verse-wikisource
收藏魔搭社区2026-01-07 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/PleIAs/verse-wikisource
下载链接
链接失效反馈官方服务:
资源简介:
"Verse-Wikisource" is a collection of 200,000 verses extracted from 9,000 works digitized by the Wikisource project.
Verses have been selected through the following process:
* All works categorized as poem (or as sub-categories of poem), using petscan.
* Only the texts parts labelled as "poem" with Wikisource internal markup system.
* Only the verses shorted than 21 words to remove remaining artifacts.
The dataset includes the following features:
* The individual verse.
* Its size.
* Its position in the original poem (which makes it possible to reconstruct the poem sequentially, if relevant)
* The page name in Wikisource.
* The link to the original document in Wikisource.
"Verse-Wikisource"是一款数据集,其收录了从维基文库(Wikisource)项目数字化的9000部作品中提取的20万条诗歌文本片段。
诗歌文本片段的筛选流程如下:
* 借助petscan工具,筛选所有被归类为诗歌(或诗歌子类)的作品。
* 仅保留维基文库内部标记系统中被标注为"poem"的文本部分。
* 仅选取词数少于21的诗歌文本片段,以剔除残留的无效内容。
该数据集包含以下特征项:
* 单条独立的诗歌文本片段
* 该诗歌文本片段的长度
* 其在原诗歌中的位置(可据此在必要时按顺序重构完整诗歌)
* 该诗歌文本片段所属的维基文库页面名称
* 该作品在维基文库中的原始文档链接
提供机构:
maas
创建时间:
2025-06-19



