A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data
收藏Taylor & Francis Group2021-09-29 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/A_new_tidy_data_structure_to_support_exploration_and_modeling_of_temporal_data/10770992
下载链接
链接失效反馈官方服务:
资源简介:
Mining temporal data for information is often inhibited by a multitude of formats: regular or irregular time intervals, point events that need aggregating, multiple observational units or repeated measurements on multiple individuals, and heterogeneous data types. This work presents a cohesive and conceptual framework for organizing and manipulating temporal data, which in turn flows into visualization, modeling, and forecasting routines. Tidy data principles are extended to temporal data by: (1) mapping the semantics of a dataset into its physical layout; (2) including an explicitly declared “index” variable representing time; (3) incorporating a “key” comprising single or multiple variables to uniquely identify units over time. This tidy data representation most naturally supports thinking of operations on the data as building blocks, forming part of a “data pipeline” in time-based contexts. A sound data pipeline facilitates a fluent workflow for analyzing temporal data. The infrastructure of tidy temporal data has been implemented in the R package, called <i>tsibble</i>. Supplementary materials for this article are available online.
面向信息挖掘的时态数据常受限于多样的数据格式:包括规则/不规则时间区间、需聚合的点事件、多观测单元或多研究个体的重复测量,以及异构数据类型。本研究提出一套连贯统一的概念框架,用于组织与处理时态数据,该框架可支撑后续的可视化、建模与预测流程。本框架通过以下方式将整洁数据(Tidy data)原则拓展至时态数据领域:(1) 将数据集的语义映射至其物理存储布局;(2) 引入显式声明的表征时间的“索引”变量;(3) 纳入由单个或多个变量构成的“键”,以实现时序场景下的单元唯一识别。此种整洁数据表示形式最自然地支持将数据操作视作构建单元,在基于时间的场景中构成“数据管道”的组成部分。完善的数据管道可助力流畅的时态数据分析工作流。整洁时态数据的底层架构已在名为tsibble的R扩展包中实现。本文的补充材料可在线获取。
提供机构:
Hyndman, Rob J.; Wang, Earo; Cook, Dianne
创建时间:
2021-09-29



