five

Waterhackweek 2019 Cyberseminar: Data access and time-series statistics

收藏
www.hydroshare.org2019-08-28 更新2025-03-26 收录
下载链接:
https://www.hydroshare.org/resource/7fb35a9b23624a07b57ab0208039e311
下载链接
链接失效反馈
官方服务:
资源简介:
Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.

关于水资源的数据以多种格式存在,由众多不同来源提供,并展现了不同的空间表征形式,如点、多边形和网格。在众多数据中,我们如何寻觅并挖掘出符合特定研究或应用需求的数据?本次研讨会将阐述在寻找和获取相关数据集中普遍存在的挑战及应对策略,聚焦于通常以固定地理点表示的站点的时间序列数据。此类数据可能源自自动监测站,如河流水位计和气象站,也可能是通过反复的实地观察和样本采集,或来源于模型输出及处理后的数据产品。我们将介绍并探讨一系列有用的数据目录,包括通过HydroClient、CUAHSI HydroShare、EarthCube数据发现工作室、Google数据集搜索以及特定机构目录可访问的CUAHSI HIS目录。此外,我们还将讨论程序化数据访问方法及工具,尤其是在Python中,特别是ulmo数据访问包,并触及数据格式和数据访问协议的社区标准作用。一旦我们获取了感兴趣的数据集,接下来的步骤通常是探索性的,聚焦于可视化及统计摘要。本次研讨会将展示用于处理和探索时间序列数据的实用方法及Python库,特别强调时间数据的独特需求。核心Python包包括Pandas、GeoPandas、Matplotlib以及在上次研讨会上介绍的地理空间可视化工具。所提出的方法可应用于其他可总结为单一时间序列的数据类型,例如流域的平均值或网格数据集中单个单元格的数据提取——这正是下次研讨会的主题。
提供机构:
www.hydroshare.org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作