five

Relationships in Multiple Diaries from the Perspective of Digital Humanities A Case Study of Diaries Related to Southwest Associated University

收藏
科学数据银行2023-05-15 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=69511416500246029783eb6b79c718c2
下载链接
链接失效反馈
官方服务:
资源简介:
This study uses the Python based NLP toolkit PaddleNLP as a text segmentation tool. To improve the accuracy of person name segmentation, a custom dictionary is set up to store all person names that appear in the diary in the dictionary file. Based on the part of speech labels generated after word segmentation processing, remove other irrelevant vocabulary and extract person name vocabulary with substantive meaning in each sentence. To count the co-occurrence frequency of each person's name vocabulary pair in all sentences, Python programming is used to enumerate the co-occurrence frequency of each person's name vocabulary pair in each sentence, and then the same person's name vocabulary pairs in all sentences of the four original diary texts are merged and counted. In order to focus the analysis on important high-frequency characters, this study limits the number of names participating in character co-occurrence analysis through a threshold set to the names that appear in the top 200 pairs of names with the highest frequency in each diary and year.
提供机构:
福建师范大学社会历史学院
创建时间:
2023-04-10
二维码
社区交流群
二维码
科研交流群
商业服务