five

Dataset of the linguistic analysis of the Eastern German Crisis Discourse from 1976 to 1986

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7693556
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset consists of a ZIP file with the speeches contained in the five volumes of the protocol of the party congress of the Socialist Unity Party of Germany (SED) between 1976 and 1986. The texts have been digitized as PDF files and then converted into machine-readable TEXT files using an OCR software. These TEXT files have been parsed with TagAnt (v. 2.0.4 Windows 10 64-bit), an annotation software. According to the data returned by AntConc (v. 4.0.5 Windows 10 64-bit), four corpora have been created: a main corpus with a total of 184,750 tokens and 16,143 types, a 'corpus A' with 70,533 tokens and 11,964 types, a 'corpus B' with 65,757 tokens and 11,967 types, and a 'corpus C' with 48,460 tokens and 8,145 types. The main corpus includes all speeches from the five volumes, while the three additional corpora have been created based on specific criteria or topics. The 'corpus A' and 'corpus B' have similar token counts and types, and likely differ based on a specific subset of speeches or themes. The 'corpus C' is the smallest corpus, with a focus on a specific aspect of the discourse. This dataset is suitable for exploring and analyzing the Eastern German Crisis Discourse from 1976 to 1986, particularly for scholars who may be interested in political and historical analysis.
创建时间:
2023-10-04
二维码
社区交流群
二维码
科研交流群
商业服务