dataset_eastern_german_crisis_discourse_1976-86
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7690020
下载链接
链接失效反馈官方服务:
资源简介:
This dataset has been considered suitable for exploring and analysing the Eastern German Crisis Discourse from 1976 to 1986, and consists of important speeches contained in five volumes of the protocol of the party congress of the Socialist Unity Party of Germany (SED). The speeches have been digitized by scanning them into PDF files and then converting them into machine-readable TEXT files using OCR software. These TEXT files have been processed with TagAnt (v.2.0.4 Windows 10 64-bit), an annotation software. According to the data returned from AntConc (v.4.0.5 Windows 10 64-bit) the dataset contains four corpora: a main corpus with a total of 184,750 tokens and 16,143 types, a 'corpus A' with 70,533 tokens and 11,964 types, a 'corpus B' with 65,757 tokens and 11,967 types, and a 'corpus C' with 48,460 tokens and 8,145 types.
创建时间:
2023-03-02



