Manually Annotated Instances of Ich ('I') from the German KoLas Corpus
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3999304
下载链接
链接失效反馈官方服务:
资源简介:
Dataset used in Andresen/Knorr (2020). The dataset comprises 360 instances of ich ('I') taken from the German learner corpus KoLaS (Andresen/Knorr 2017, see http://hdl.handle.net/11022/0000-0001-B732-8 for full corpus access) and manually annotated with categories taken from Steinhoff (2007).
Column descriptions:
document: name of the document by which it can be found in the KoLaS corpus
code_annotator1 - code_annotator4: Annotations by four annotators. Possible values: Verfasser-Ich (author I), Forscher-Ich (researcher I), Erzähler-Ich (narrator I)
max_agreement_freq: Highest number of anntators that agreed on one label
max_agreement_label: Label on which the highest number of annotators agreed
context_before: 150 characters of context before the match
match: the match itself (either ich or Ich)
context_after: 150 characters of context after the match
References
Andresen M, Knorr D. KoLaS – Ein Lernendenkorpus in der Schreibberatungsausbildung einsetzen. Zeitschrift Schreiben. Published online July 5, 2017:10-17.
Andresen M, Knorr D. Exploring the Use of the Pronoun I in German Academic Texts with Machine Learning. In: Burghardt M, Müller-Birn C, eds. Methoden und Anwendungen der Computational Humanities. Lecture Notes in Informatics (LNI). Gesellschaft für Informatik; 2020.
Steinhoff T. Zum ich-Gebrauch in Wissenschaftstexten. Zeitschrift für germanistische Linguistik. 2007;35(1-2):1–26.
创建时间:
2020-08-25



