five

dataset_eastern_german_crisis_discourse_1976-86

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7690020
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been considered suitable for exploring and analysing the Eastern German Crisis Discourse from 1976 to 1986, and consists of important speeches contained in five volumes of the protocol of the party congress of the Socialist Unity Party of Germany (SED). The speeches have been digitized by scanning them into PDF files and then converting them into machine-readable TEXT files using OCR software. These TEXT files have been processed with TagAnt (v.2.0.4 Windows 10 64-bit), an annotation software. According to the data returned from AntConc (v.4.0.5 Windows 10 64-bit) the dataset contains four corpora: a main corpus with a total of 184,750 tokens and 16,143 types, a 'corpus A' with 70,533 tokens and 11,964 types, a 'corpus B' with 65,757 tokens and 11,967 types, and a 'corpus C' with 48,460 tokens and 8,145 types.
创建时间:
2023-03-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作