five

HourGlass corpus

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3415632
下载链接
链接失效反馈
官方服务:
资源简介:
The HourGlass corpus is collection of 348 documents (short texts) in Spanish tagged with temporal expressions following the TimeML standard. Since it was concieved as a test bed for temporal taggers, each document has an attached a tag and a registy as classification. The corpus is divided in two parts, depending of the source of the texts. - The SYNTHETIC part is a collection of 285 documents specifically designed to test some functionalities a temporal tagger should cover, such as detecting basic expressions like "It is five o'clock". Tags such as "Hour", "Dates" or "False" (this is, sentences where some confussing expression should not be tagged) were added to each document in order to facilitate their use (e.g., in the case the coverage of just a some type of expression needs to be tested). - The PEOPLE part is a collection of 67 documents (although just 63 were added to the final HourGlass corpus due to their ambiguity) proposed by people foreign to the temporal annotation task. They were asked to write sentences with what they considered to be temporal expressions, and these sentences were afterwards analyzed and annotated. Besides the tags, also the register of the sentence (e.g. "normal", "Latin American" or "colloquial") were added. All the documents are normalized using "2019-12-20" as anchor date.
创建时间:
2020-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作