five

Supporting data for "aws-s3-integrity-check: an open-source bash tool to verify the integrity of a dataset stored on Amazon S3"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/102433
下载链接
链接失效反馈
官方服务:
资源简介:
Amazon Simple Storage Service (Amazon S3) has become a widely used and reliable platform for storing large biomedical datasets. However, unintended changes to the original data can occur during the data writing and transmission, ultimately altering the original contents of the object transferred and producing unexpected results when later accessed. Despite the interest in verifying end-to-end data integrity, there are no existing open-source and easy-to-use tools to accomplish this mission.<br>To bridge this gap, here we present <em>aws-s3-integrity-check</em>, a user-friendly, lightweight and reliable bash tool to verify the integrity of a dataset stored within an Amazon S3 bucket. By using this tool, we completed the integrity verification of 1,045 records ranging between 5 Bytes and 10 Gigabytes (GB) in size and occupying a total of ~935 GB of Amazon S3 cloud storage space in ~114 minutes. The <em>aws-s3-integrity-check</em> tool also provides file-by-file on-screen and log-file-based information about the status of each individual integrity check.<br>To the best of our knowledge, the <em>aws-s3-integrity-check</em> bash tool is the only open-source tool that allows verifying the integrity of a dataset uploaded to the Amazon S3 Storage system in a quick, reliable and efficient manner. The <em>aws-s3-integrity-check</em> tool can be used to test any file type and file size and it is freely available for use and download at https://github.com/SoniaRuiz/aws-s3-integrity-check and https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check.

亚马逊简单存储服务(Amazon Simple Storage Service,Amazon S3)现已成为存储大型生物医学数据集的广泛使用且可靠的平台。然而,在数据写入与传输过程中,原始数据可能会发生非预期变更,最终导致传输对象的原始内容被篡改,并在后续访问时产生意外结果。尽管学界对端到端数据完整性验证存在迫切需求,但目前尚无开源且易用的工具可完成此项任务。 为填补这一空白,我们推出了aws-s3-integrity-check工具——一款易用、轻量且可靠的bash工具,用于验证存储于亚马逊S3存储桶内的数据集完整性。借助该工具,我们完成了1045条记录的完整性验证工作,这些记录的大小介于5字节(Bytes)至10吉字节(GB)之间,总占用约935吉字节的亚马逊S3云存储空间,耗时仅约114分钟。该aws-s3-integrity-check工具还支持逐文件输出屏幕实时提示与日志文件,以展示每一项完整性检查的具体状态。 据我们所知,这款aws-s3-integrity-check bash工具是目前唯一一款开源工具,可快速、可靠且高效地验证上传至亚马逊S3存储系统的数据集完整性。该工具支持测试任意类型与大小的文件,可免费使用与下载,获取地址为:https://github.com/SoniaRuiz/aws-s3-integrity-check 及 https://hub.docker.com/r/soniaruiz/aws-s3-integrity-check。
提供机构:
GigaScience Database
创建时间:
2023-08-08
二维码
社区交流群
二维码
科研交流群
商业服务