five

Dataset of imagery and sentiment in frontier poetry throughout history

收藏
科学数据银行2025-07-12 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=60c9d183add1483399ed9bd4fa18ed9f
下载链接
链接失效反馈
官方服务:
资源简介:
Frontier poetry is one of the most important themes in classical Chinese poetry, focusing on life and scenery in border regions. Imagery is a semantic composite of subjective and objective interactions, representing the objective objects of the poet's subjective emotions. The imagery system of frontier poetry exhibits significant regional convergence and cultural symbolism. This paper constructs a dataset of imagery sentiment in frontier poetry, which includes 40,000 frontier poems from the pre-Qin period to the present. It uses a combination of textual criticism and computational linguistics theories and methods to annotate and proofread the imagery and sentiments expressed in frontier poetry. This dataset not only provides rich research data for the study of frontier poetry, but also provides a macro perspective for in-depth exploration of the evolution of imagery sentiment in poetry.This dataset crawled 42,836 frontier poems from the Internet, covering war poems from the Book of Songs in the pre-Qin period to contemporary new poems, spanning the pre-Qin to modern and contemporary periods, striving to be complete, accurate, and reliable. The crawled data was cleaned and standardized, non-text symbols and redundant format tags were removed, a table of variant characters was established, and ancient texts were used to restore garbled characters through exegesis. Incorrectly identified poems were deleted, and finally, sentence segmentation and error correction were performed, with each sentence separated by commas and periods. In the end, a total of 42,807 high-quality frontier poems were obtained. Based on the collected poem texts, we constructed a data annotation system containing the encoding, author, name, imagery, and sentiment information of the poems. Each poem has a unique number, with the first two digits representing the dynasty number, such as “01” for the pre-Qin period, the middle four digits representing the author number, with poets sorted by their birth and death years, and the last two digits representing the serial number of the work, sorted by the first letter of the title. The imagery data of the poems and lyrics is annotated using a pre-trained model and manual review, while the sentiment is annotated manually.The final dataset consists of 11 CSV tables, with one table for each dynasty, and the files are named after the dynasty. Each data point consists of six parts: code, author, name, text, imagery, and sentiment.
提供机构:
青海师范大学
创建时间:
2025-05-21
二维码
社区交流群
二维码
科研交流群
商业服务