community-datasets/gutenberg_time
收藏数据集概述
数据集摘要
Gutenberg Time数据集包含52,183本小说的所有明确时间引用,这些小说的全文可通过Project Gutenberg获取。该数据集用于时间分类任务。
支持的任务和排行榜
该数据集支持多类分类任务。
语言
数据集中的文本为英语,主要用于时间分类任务。
数据集结构
数据实例
json { "guten_id": 28999, "hour_reference": 12, "time_phrase": "midday", "is_ambiguous": False, "time_pos_start": 133, "time_pos_end": 134, "tok_context": "Sorrows and trials she had had in plenty in her life , but these the sweetness of her nature had transformed , so that from being things difficult to bear , she had built up with them her own character . Sorrow had increased her own power of sympathy ; out of trials she had learnt patience ; and failure and the gradual sinking of one she had loved into the bottomless slough of evil habit had but left her with an added dower of pity and tolerance . So the past had no sting left , and if iron had ever entered into her soul it now but served to make it strong . She was still young , too ; it was not near sunset with her yet , nor even midday , and the future that , humanly speaking , she counted to be hers was almost dazzling in its brightness . For love had dawned for her again , and no uncertain love , wrapped in the mists of memory , but one that had ripened through liking and friendship and intimacy into the authentic glory . He was in England , too ; she was going back to him . And before very long she would never go away from him again ." }
数据字段
guten_id: 字符串类型,Gutenberg ID号。hour_reference: 字符串类型,0到23的小时数。time_phrase: 字符串类型,对应引用小时的短语。is_ambiguous: 布尔类型,是否清楚时间是上午还是下午。time_pos_start: 整数类型,time_phrase开始位置的标记位置。time_pos_end: 整数类型,time_phrase结束位置的标记位置(不包括)。tok_context: 字符串类型,time_phrase出现的上下文,以空格分隔的标记。
数据分割
数据集没有进行分割。
数据集创建
策划理由
时间流是我们行动的不可或缺的指南,并为事件的逻辑进展提供框架。在大多数虚构作品中,故事事件发生在一天中的可识别时间段内。识别故事的时间流程对于理解文本至关重要。
源数据
初始数据收集和规范化
数据来自Project Gutenberg中的52,183本小说。
源语言生产者
小说作者。
注释
注释过程
手动注释。
注释者
两位作者。
个人和敏感信息
数据集中不包含个人或敏感信息。
使用数据的注意事项
数据集的社会影响
[更多信息需补充]
偏见的讨论
[更多信息需补充]
其他已知限制
[更多信息需补充]
附加信息
数据集策展人
Allen Kim, Charuta Pethe, Steven Skiena, Stony Brook University
许可信息
[更多信息需补充]
引用信息
@misc{kim2020time, title={What time is it? Temporal Analysis of Novels}, author={Allen Kim and Charuta Pethe and Steven Skiena}, year={2020}, eprint={2011.04124}, archivePrefix={arXiv}, primaryClass={cs.CL} }
贡献
感谢@TevenLeScao添加此数据集。



