HourGlass corpus
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3415632
下载链接
链接失效反馈官方服务:
资源简介:
The HourGlass corpus is collection of 348 documents (short texts) in Spanish tagged with temporal expressions following the TimeML standard. Since it was concieved as a test bed for temporal taggers, each document has an attached a tag and a registy as classification.
The corpus is divided in two parts, depending of the source of the texts.
- The SYNTHETIC part is a collection of 285 documents specifically designed to test some functionalities a temporal tagger should cover, such as detecting basic expressions like "It is five o'clock". Tags such as "Hour", "Dates" or "False" (this is, sentences where some confussing expression should not be tagged) were added to each document in order to facilitate their use (e.g., in the case the coverage of just a some type of expression needs to be tested).
- The PEOPLE part is a collection of 67 documents (although just 63 were added to the final HourGlass corpus due to their ambiguity) proposed by people foreign to the temporal annotation task. They were asked to write sentences with what they considered to be temporal expressions, and these sentences were afterwards analyzed and annotated. Besides the tags, also the register of the sentence (e.g. "normal", "Latin American" or "colloquial") were added.
All the documents are normalized using "2019-12-20" as anchor date.
创建时间:
2020-01-21



