Metadata and annotation data for the XSample corpus on German academic language

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/6553109

下载链接

链接失效反馈

官方服务：

资源简介：

The XSample corpus war created in the project XSample (https://www.izus.uni-stuttgart.de/fokus/fdm-projekte/xsample/) at Universität Stuttgart in 2021 by Melanie Andresen and Axel Pichler. It contains 135 German academic journal articles, 45 each from the disciplines linguistics, literary studies and philosophy. The texts themselves cannot be made public for copyright reasons. However, metadata and some annotation data are published here. xsample-metadata.csv This file contains metadata on the texts in the corpus, like journal, title, authors, text length, and the URL to the original paper. It also contains two analytical metrics, 'past-ratio' and 'temp-expr-ratio', that are based on the annotations in the other two files. The variable 'past-ratio' expresses the proportion of verbs in past tense relative to all finite verbs in the text. The variable 'temp-expr-ratio' gives the number of temporal expressions per 1000 token. xsample-heidel.csv This file contains all temporal expressions found and classified by the annotation tool HeidelTime (https://github.com/HeidelTime/heideltime, V. 2.2.1, Strötgen & Gertz 2013 ). The variable 'position' expresses the position of the first character of the temporal expression in the text in characters. xsample-sticker2.csv This file contains all finite verbs found and classified by the annotation tool sticker2 (https://github.com/stickeritis/sticker2). The variable 'position' expresses the position of the first character of the finite verb in the text in characters. References Strötgen, Jannik & Michael Gertz. 2013. Multilingual and cross-domain temporal tagging. Language Resources and Evaluation. Springer 47(2). 269–298. https://doi.org/10.1007/s10579-012-9179-y.

创建时间：

2022-05-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集