Luc Bat dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/fsoft-ailab/Poem-Generator
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了87,609首传统越南六八诗,题材广泛,风格多样,在整理过程中确保了诗歌的质量并遵循了诗歌的规则。这些诗歌均来源于可靠渠道,并经过了严格的质量筛选,以确保符合六八诗的风格,这种风格以特定的韵律和声调规则为特点。数据集规模宏大,共有约87,609首诗,大约260万句。该数据集的任务是用于越南诗歌的生成。
This dataset houses 87,609 traditional Vietnamese lục bát poems, which span a wide array of themes and feature diverse stylistic traits. During the curation workflow, strict quality control has been implemented to uphold the poems' quality and adhere to the formal conventions of the genre. All source materials are obtained from reliable channels, and the poems have been rigorously filtered to ensure they conform to the distinctive formal features of lục bát verse, which is characterized by specific rhyme and tonal rules. With a considerable scale, the dataset consists of approximately 87,609 poems totaling roughly 2.6 million lines. This dataset is dedicated to Vietnamese poetry generation tasks.
提供机构:
Authors of the paper



