Metadata and annotation data for the XSample corpus on German academic language
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6553109
下载链接
链接失效反馈官方服务:
资源简介:
The XSample corpus war created in the project XSample (https://www.izus.uni-stuttgart.de/fokus/fdm-projekte/xsample/) at Universität Stuttgart in 2021 by Melanie Andresen and Axel Pichler. It contains 135 German academic journal articles, 45 each from the disciplines linguistics, literary studies and philosophy. The texts themselves cannot be made public for copyright reasons. However, metadata and some annotation data are published here.
xsample-metadata.csv
This file contains metadata on the texts in the corpus, like journal, title, authors, text length, and the URL to the original paper. It also contains two analytical metrics, 'past-ratio' and 'temp-expr-ratio', that are based on the annotations in the other two files. The variable 'past-ratio' expresses the proportion of verbs in past tense relative to all finite verbs in the text. The variable 'temp-expr-ratio' gives the number of temporal expressions per 1000 token.
xsample-heidel.csv
This file contains all temporal expressions found and classified by the annotation tool HeidelTime (https://github.com/HeidelTime/heideltime, V. 2.2.1, Strötgen & Gertz 2013 ). The variable 'position' expresses the position of the first character of the temporal expression in the text in characters.
xsample-sticker2.csv
This file contains all finite verbs found and classified by the annotation tool sticker2 (https://github.com/stickeritis/sticker2). The variable 'position' expresses the position of the first character of the finite verb in the text in characters.
References
Strötgen, Jannik & Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation. Springer 47(2). 269–298. https://doi.org/10.1007/s10579-012-9179-y.
创建时间:
2022-05-16



