five

Cloud-based Jupyter Notebooks for Water Data Analysis

收藏
DataONE2022-04-15 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:1268334d8b3da7a4b0f330908a4ef15783f1d1546a9a334309dcdb86a54740c1
下载链接
链接失效反馈
官方服务:
资源简介:
The development and adoption of technologies by the water science community to improve our ability to openly collaborate and share workflows will have a transformative impact on how we address the challenges associated with collaborative and reproducible scientific research. Jupyter notebooks offer one solution by providing an open-source platform for creating metadata-rich toolchains for modeling and data analysis applications. Adoption of this technology within the water sciences, coupled with publicly available datasets from agencies such as USGS, NASA, and EPA enables researchers to easily prototype and execute data intensive toolchains. Moreover, implementing this software stack in a cloud-based environment extends its native functionality to provide researchers a mechanism to build and execute toolchains that are too large or computationally demanding for typical desktop computers. Additionally, this cloud-based solution enables scientists to disseminate data processing routines alongside journal publications in an effort to support reproducibility. For example, these data collection and analysis toolchains can be shared, archived, and published using the HydroShare platform or downloaded and executed locally to reproduce scientific analysis. This work presents the design and implementation of a cloud-based Jupyter environment and its application for collecting, aggregating, and munging various datasets in a transparent, sharable, and self-documented manner. The goals of this work are to establish a free and open source platform for domain scientists to (1) conduct data intensive and computationally intensive collaborative research, (2) utilize high performance libraries, models, and routines within a pre-configured cloud environment, and (3) enable dissemination of research products. This presentation will discuss recent efforts towards achieving these goals, and describe the architectural design of the notebook server in an effort to support collaborative and reproducible science This was presented as an EPoster at the 2017 American Geophysical Union and can be found at: https://agu2017fallmeeting-agu.ipostersessions.com/default.aspx?s=2B-C4-70-3C-B8-A0-0D-77-35-04-7C-F2-A4-1B-36-10

水科学领域的科研群体为提升开放式协作与工作流共享能力而开发和应用各类技术,将对我们应对协作式与可复现科学研究相关挑战的方式产生变革性影响。Jupyter笔记本(Jupyter Notebook)提供了一种解决方案:其作为开源平台,可用于为建模与数据分析应用构建富含元数据的工具链。该技术在水科学领域的推广应用,结合美国地质调查局(USGS)、美国国家航空航天局(NASA)与美国环境保护署(EPA)等机构公开的数据集,可让研究人员轻松进行原型开发并运行数据密集型工具链。此外,在基于云的环境中部署该软件栈,可拓展其原生功能,为研究人员提供构建和运行超出普通台式计算机算力上限的大型或高计算需求工具链的途径。与此同时,这种基于云的解决方案还可让科学家将数据处理流程与期刊论文一同发布,以支撑研究的可复现性。例如,此类数据收集与分析工具链可通过HydroShare平台进行共享、存档与发布,也可下载至本地运行以复现科学分析过程。本研究介绍了一种基于云的Jupyter环境的设计与实现,以及其以透明、可共享且自文档化的方式收集、整合与处理各类数据集的应用场景。本研究的目标是为领域科研人员搭建一个免费开源平台,以实现三大目标:(1)开展数据密集型与计算密集型协作研究;(2)在预配置的云环境中调用高性能库、模型与流程;(3)支持研究成果的传播共享。本次汇报将探讨为达成上述目标所开展的近期工作,并阐释笔记本服务器的架构设计,以期为协作式与可复现科学研究提供支撑。本内容曾作为电子海报在2017年美国地球物理联合会(AGU)会议上展示,相关页面可访问:https://agu2017fallmeeting-agu.ipostersessions.com/default.aspx?s=2B-C4-70-3C-B8-A0-0D-77-35-04-7C-F2-A4-1B-36-10
创建时间:
2022-04-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作