Supporting workflows for "Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100584
下载链接
链接失效反馈官方服务:
资源简介:
Container virtualization technologies such as Docker are popular in the bioinformatics domain as they improve portability and reproducibility of software deployment. Along with software packaged in containers, the workflow description standards Common Workflow Language also enable to perform data analysis on multiple different computing environments with ease. These technologies accelerate the use of on-demand cloud computing platform which can scale out according to the amount of data. However, to optimize the time and the budget on a cloud usage, users need to select a suitable instance type corresponding to the resource requirements of their workflows. <br> We developed CWL-metrics, a utility tool for cwltool, the reference implementation of CWL, to collect runtime metrics of Docker containers and workflow metadata to analyze resource requirement of workflows. We demonstrate the analysis by using seven transcriptome quantification workflows on six instance types. The result showed instance type options of lower financial cost and faster execution time with required amount of computational resources.<br>The summary of resource requirements of workflow executions provided by CWL-metrics can help users to optimize the selection of cloud computing instances. The runtime metrics data also help users to share workflows among different workflow management frameworks. A Jupyter notebook file reproducing all the figures in the manuscript is available here
容器虚拟化技术(Container virtualization technologies)如Docker在生物信息学领域广受欢迎,因其可提升软件部署的可移植性与可复现性。与容器化封装的软件相配合,工作流描述标准通用工作流语言(Common Workflow Language)能够让用户便捷地在多种异构计算环境中执行数据分析任务。上述技术推动了按需云计算平台的应用,此类平台可依据数据量实现弹性扩容。然而,为优化云计算使用的时长与成本预算,用户需要选取与自身工作流资源需求相匹配的合适实例类型。<br>我们开发了CWL-metrics——一款针对CWL参考实现cwltool的实用工具,用于采集Docker容器的运行时指标与工作流元数据,以分析工作流的资源需求。我们通过在6种实例类型上运行7个转录组定量工作流来展示该分析方法,结果表明可选取兼具更低使用成本、更快执行速度且满足所需计算资源量的实例类型。<br>CWL-metrics提供的工作流执行资源需求汇总信息,可帮助用户优化云计算实例的选型。其采集的运行时指标数据也便于用户在不同工作流管理框架间共享工作流。本研究附带了可复现论文中所有图表的Jupyter笔记本(Jupyter notebook)文件,获取方式见此处
提供机构:
GigaScience Database
创建时间:
2019-04-05



