Hippocorpus

Name: Hippocorpus
Creator: Kaggle
Published: 2021-01-24 00:00:00
License: 暂无描述

www.kaggle.com2021-01-24 更新2025-01-16 收录

下载链接：

https://www.kaggle.com/saurabhshahane/hippocorpus

下载链接

链接失效反馈

官方服务：

资源简介：

### Context Models of human cognition hold that information processing occurs in a series of stages. Cognitive psychology, in particular, is concerned with the internal mental processes that begin with the appearance of an external stimulus and result in a behavioral response. ### Content Hippocorpus is dataset of 6854 English diary like short stories about recalled and imagined events. Using a crowdsourcing framework the respective owners of this datasets collected recalled stories and summaries from workers, then provided these collected summaries to other workers who write imagined stories. Months later dataset creators collected a retold version of the recalled stories from the subset of recalled authors. Dataset contains author demographics (age, gender, race), their openness to experience, as well as some variables regarding the author's relationship to the event (e.g., how personal the event is, how often they tell its story, etc.) content source - https://msropendata.com/datasets/0a83fb6f-a759-4a17-aaa2-fbac84577318 .. ### Acknowledgements Thanks to Original Authors Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, and James Pennebaker (2020) _Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models._ ACL. Project Link - https://www.microsoft.com/en-us/research/project/neural-models-for-studies-of-human-cognition/ Original Dataset - https://msropendata.com/datasets/0a83fb6f-a759-4a17-aaa2-fbac84577318 ### License: [Open Use of Data Agreement v1.0](https://msropendata-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view) ### Dataset Discription : The dataset contains short stories about recalled and imagined events. These are the columns in the data: - `AssignmentId`: Unique ID of this story - `WorkTimeInSeconds`: Time in seconds that it took the worker to do the entire HIT (reading instructions, story writing, questions) - `WorkerId`: Unique ID of the worker (random string, not MTurk worker ID) - `annotatorAge`: Lower limit of the age bucket of the worker. Buckets are: 18-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55+ - `annotatorGender`: Gender of the worker - `annotatorRace`: Race/ethnicity of the worker - `distracted`: How distracted were you while writing your story? (5-point Likert) - `draining`: How taxing/draining was writing for you emotionally? (5-point Likert) - `frequency`: How often do you think about or talk about this event? (5-point Likert) - `importance`: How impactful, important, or personal is this story/event to you? (5-point Likert) - `logTimeSinceEvent`: Log of time (days) since the recalled event happened - `mainEvent`: Short phrase describing the main event described - `memType`: Type of story (recalled, imagined, retold) - The target variable - `mostSurprising`: Short phrase describing what the most surprising aspect of the story was - `openness`: Continuous variable representing the openness to experience of the worker - `recAgnPairId`: ID of the recalled story that corresponds to this retold story (null for imagined stories). Group on this variable to get the recalled-retold pairs. - `recImgPairId`: ID of the recalled story that corresponds to this imagined story (null for retold stories). Group on this variable to get the recalled-imagined pairs. - `similarity`: How similar to your life does this event/story feel to you? (5-point Likert) - `similarityReason`: Free text annotation of similarity - `story`: Story about the imagined or recalled event (15-25 sentences) - `stressful`: How stressful was this writing task? (5-point Likert) - `summary`: Summary of the events in the story (1-3 sentences) - `timeSinceEvent`: Time (number of days) since the recalled event happened

认知心理学认为，信息处理过程发生在一系列阶段之中。本数据集，名为Hippocorpus，包含6854篇关于回忆和想象事件的英文日记体短篇小说。该数据集通过众包框架构建，数据集的拥有者从工作者处收集了回忆故事和摘要，随后将这些摘要提供给其他工作者，以撰写想象故事。数月后，数据集的创建者从回忆作者子集中收集了回忆故事的复述版本。数据集包含了作者的的人口统计学信息（如年龄、性别、种族），他们对体验的开放度，以及一些关于作者与事件关系的变量（例如，事件的个人程度、讲述故事频率等）。数据内容来源：https://msropendata.com/datasets/0a83fb6f-a759-4a17-aaa2-fbac84577318 致谢：感谢原始作者 Maarten Sap, Eric Horvitz, Yejin Choi, Noah A. Smith, and James Pennebaker (2020) _回忆与想象：通过神经网络语言模型探索人类记忆与认知_。ACL。项目链接：https://www.microsoft.com/en-us/research/project/neural-models-for-studies-of-human-cognition/ 原始数据集：https://msropendata.com/datasets/0a83fb6f-a759-4a17-aaa2-fbac84577318 许可：[开放数据使用协议 v1.0](https://msropendata-web-api.azurewebsites.net/licenses/f1f352a6-243f-4905-8e00-389edbca9e83/view) 数据集描述：该数据集包含关于回忆和想象事件的短篇小说。以下是数据中的列： - `AssignmentId`：本故事的唯一ID - `WorkTimeInSeconds`：工作者完成整个HIT（阅读指示、故事写作、问题）所需的时间（秒数） - `WorkerId`：工作者的唯一ID（随机字符串，非MTurk工作者ID） - `annotatorAge`：工作者年龄桶的最小值。桶包括：18-24，25-29，30-34，35-39，40-44，45-49，50-54，55+ - `annotatorGender`：工作者的性别 - `annotatorRace`：工作者的种族/民族 - `distracted`：你在写作故事时有多分心？（5点李克特量表） - `draining`：写作对你情感上的消耗程度如何？（5点李克特量表） - `frequency`：你认为你多经常思考或谈论这一事件？（5点李克特量表） - `importance`：这个故事/事件对你有多大的影响、重要性和个人意义？（5点李克特量表） - `logTimeSinceEvent`：自回忆事件发生以来的时间（天数）的对数 - `mainEvent`：描述所描述的主要事件的简短短语 - `memType`：故事类型（回忆、想象、复述）- 目标变量 - `mostSurprising`：描述故事中最令人惊讶的方面的简短短语 - `openness`：表示工作者体验开放度的连续变量 - `recAgnPairId`：与该复述故事相对应的回忆故事的ID（对于想象故事为null）。对此变量进行分组以获取回忆-复述对。 - `recImgPairId`：与该想象故事相对应的回忆故事的ID（对于复述故事为null）。对此变量进行分组以获取回忆-想象对。 - `similarity`：你认为这一事件/故事与你的生活有多相似？（5点李克特量表） - `similarityReason`：关于相似性的自由文本注释 - `story`：关于想象或回忆事件的小说（15-25句话） - `stressful`：这一写作任务有多大的压力？（5点李克特量表） - `summary`：故事中事件摘要（1-3句话） - `timeSinceEvent`：自回忆事件发生以来的时间（天数）

提供机构：

Kaggle

5,000+

优质数据集

54 个

任务类型

进入经典数据集